How to write a data story with bad data

The General Election is the most important media event of the year. Its a chance to earn your stripes as a journalist and get some page leads in your portfolio.

This was the thought I had last week when I was on work experience with the Times’ Redbox supplement. If you’ve heard of Redbox, you’re probably also aware of their strong emphasis on data driven journalism. Redbox receives exclusive polling data from Yougov to keep them ahead of the curve.

I wanted to prove that I could write strong data stories. I had already been working on a feature about young candidates in the election, and thought it could work as a data story. Only problem is, the data on parliamentary candidates themselves is inconsistent as hell.

I used a website called Your Next MP, which had a spreadsheet of every candidate running in the election this year.

The data was pretty bad. There were huge chunks of information missing, no guarantee on the accuracy of the data and another journalist mentioned “crowdsourcing” when the website came up in conversation.

What should you do in this position? Do you give up after days of research and interviewing? Do you try and find a different angle that doesn’t need data?

It might not look exactly how you thought it would from the beginning, but a bad or incomplete dataset doesn’t have to mean the story is dead. there’s lots of ways you can tell a strong, accurate data story that don’t involve perfect data.


Clean it

My battle with the data
My battle with the data

First thing’s first, you can’t do anything with a messy dataset. You should clean what information you do have so you at least know the extent of the problem. To view how much data you have to talk about, right-click on the column you’re looking at (in my case this was “birth date”), select “filter” and select the values you want to keep. Although this is not a definitive list it gave me an large cross-section of young candidates to research and talk about further in my story.

You might find that the data your left with is incomplete but it still paints an interesting picture and backs up what you already know. Equally, you could find that you simply do not have enough information to make your story data-centric. Either way you need to know. Make a note of the change with each stage of the cleaning process so you know how inconsistent your data actually is. Then you can make an informed decision on how important a role the stats will be able to play. You can find out more on how to clean data from this handy guide, or in this video.


Cite your sources.

Crowd-sourcing site Your Next MP
Crowd-sourcing site Your Next MP

You should be doing this anyway, but it’s even more important if you’re worried about the accuracy of your data. Take as an example. I was fully prepared to analyse information on incumbents on this site for a data story on same-sex marriage. That is until I spoke to Roger Smith at the Press Association:

“The information is gathered through crowdsourcing, which makes it really rather unreliable. There may well be quite a few last-minute withdrawals and so the data’s accuracy can’t be guaranteed.”

This looks like a death sentence for your feature, but it’s actually something you can work with. As long as you know and acknowledge that the dataset you have doesn’t tell the whole story, you still have the basis for an insightful piece that is enhanced by data.



Don’t analyse too deeply


Do not be tempted to over-alter the dataset to find an angle . The chances are that anything you do to the set beyond just cleaning will create further inaccuracies. Abandon any grand ideas you had of merging with polling data or finding average ages. It’s not going to work. Here’s an example of a dataset I had to work with which had the same issues.

Remember that the data will not tell the whole story, but you can look at it and analyse it to get some interesting statistics to illustrate your bigger point.




Avoid misleading visualisations

Here's one I made earlier...
Here’s one I made earlier…

For the same reason that you shouldn’t be analysing the data too deeply, you shouldn’t be putting the information you do have into a graph. Graphs and maps assume that the data is gospel. If you can’t guarantee that then any visualisation is misleading and uninformative.


Focus on people, places and personalities

using candidate data to make contacts
using candidate data to make contacts

Your data is not going to be the hook of a ground-breaking discovery, but it’s actually very rare for data to make front page news. Instead, you should be using your data as a starting point to explore different areas, people and trends. Say your story is about candidates under 20 running in the election, and you can only find 8 people who fit the bill, even though you know there’s more. Use the number of candidates you have found as a contact list rather than the story, and before you know it you have some interesting insights into the political careers of teenagers.


So there you have it. Use this guide any time you have a dataset you feel very uncomfortable using as the basis of a story, or even if you’re new to data journalism and don’t know what to analyse. You don’t have to be a statistician to create great data stories.



Leave a Reply