Carl Bialik interview: ‘Any data set has eureka potential’

Carl Bialik

Carl Bialik is a writer for Nate Silver‘s new website FiveThirtyEight, having recently moved from the Wall Street Journal where he started The Numbers Guy column. I ask him about the ups, downs and difficulties of being a data journalist, as well as what he thinks are the most important traits for being successful in the field.

You recently moved to FiveThirtyEight from the WSJ: do you think the two publications differ in their approach to data analysis?

With The Numbers Guy at the WSJ, my role was more about looking at other people’s data analyses, taking them apart and finding the weaknesses in them. I’m going to be doing some of that at FiveThirtyEight but will be more focussed on doing original data analysis.

When you first started at WSJ, were you a data journalist? Or was this more of an organic development?

When I started at the WSJ I don’t think I had even heard the term “data journalism”, and I wasn’t a data journalist for most of my first years there. The more specialised role came later when I started writing The Numbers Guy column. Then, when the WSJ expanded its sport coverage, I started to write much more about sports from a data point of view.

Which is your favourite sport to write about?

My favourite sport to follow is tennis, which is in some ways both my favourite and least favourite sport to write about. It’s my favourite because it’s largely untapped territory in terms of data analysis, but it’s also one of my least favourites because of the way that the data has been archived, making it one of the most difficult to get accurate data for. It’s a pretty fertile area, though, and although it’s not big in the USA, there’s always going to be a focus around major events.

What steps do you take to make sure that the data you are analysing is accurate?

There are some built-in error checks with analysis, which can help determine the reliability of the data. These include checking whether the data you are running the analysis on makes sense, and looking whether different analyses produce similar results. Another important question to ask yourself is whether there is some important factor that you are not controlling for.

At FiveThirtyEight we also have a quantitative editor who reviews your work and points things out for you, such as confounding variables and sources of error. Readers are really vital for this, too: the feedback we have already received from readers who tell us when they think we have made mistakes has been extremely useful.

What do you think are the most important traits for being a good data journalist?

The first is having a good statistical foundation, which includes being comfortable with coding and using various types of software. The others are the same as for all types of journalist: being a collaborator, fair, open-minded, ethical, and responsive to both readers and sources.

Which data journalists do you particularly admire?

I’ve admired the work of many data journalists, including my current colleagues, and my former colleagues  at the Wall Street Journal. Certainly Nate Silver at FiveThirtyEight: he is a large part of the reason that I wanted to work with FiveThirtyEight in the first place. Also my colleague Mona Chalabi because she has a great eye for finding stories with interesting data behind them.

What’s the best part of being a data journalist?

Compared to most journalism, I think there is more potential to have an “aha” [eureka] moment for any given story, since it can sometimes be a slog if you’re trying to get that just from interviews or other sources. Any data set has the potential to give you a couple of these moments if you’re spending just a few hours looking at it.

And the most difficult part?

I think number one is when you can’t get hold of the data for something: occasionally a topic can be very hard to measure, and you would love to write about it but just don’t have a way in. This is often the case with sport in particular, where there can be measurement problems, issues with the quality of the data, or even a complete scarcity of it. So issues with data quality and access are the most difficult parts.



Leave a Reply