Alberto Cairo thinks you can solve almost all your problems with data journalism in two steps.
“Journalists,” he tells me, “in my opinion, tend to oversimplify matters quite a lot.
“Let’s say that you are exploring the average height of people in an area. If you only report the average, that may be wrong or right depending on how spread out the data is. If all people are more or less around that height then reporting that average is correct.
“But if you have a huge range of heights, the average is still the same but you may not be reporting how wide the spread is. Or if your distribution is bimodal, the average will still be the same, but you have a cluster of short people on one end and a cluster of tall people on the other end, that’s a feature of the data that will go unnoticed if you only report the averages.”
His other complaints about journalists attempting to write about data will be familiar to despairing members of data desks in newsrooms around the world. Speculative extrapolation, inferring causation from correlation, a lack of understanding of probability and uncertainty, and many other journalistic foibles all fall under Cairo’s fierce scrutiny.
“The thing is,” he insists, “all the things I’m mentioning are easily solved, so it’s not that you need to take a course on advanced correlation and regression analysis, it’s just a matter of learning Stats 101 and then read two, three books on quantitative thinking and quantitative reasoning.
“That’s how you avoid 80 per cent or 90 per cent of the problems, and the other 10, 20 per cent will be avoided if you consult with experts every time you do a story based on data, which is something we need to systematically do.”
Even most students of data journalism probably can’t say that they’ve read two or three books on quantitative thinking and quantitative reasoning, but if we’re serious about our pursuit of the truth, perhaps next time a disgraced politician publishes their memoirs we should Google ‘best books on statistics’ instead.
Though Cairo is of course prolific in data visualisation in his own right, he is clearly a teacher at heart. After two decades working in infographics and data visualisation, he could be forgiven for losing an ounce of enthusiasm, but the Knight Chair in Visual Journalism and author of two books on data visualisation still has a twinkle in his eye as we discuss his recent work.
He’s just finished teaching an online course on using data visualisation as more than just a method of communication. Instead, he is focusing on using it “to find stories in data”.
“There’s a whole branch of statistics,” he explains, “which was defined around the 60s, 70s and the 80s by a statistician called John Tukey.
“He wrote a book titled Exploratory Data Analysis. The whole field of data visualisation in computer science and statistics focuses mostly not on communication, it focuses on exploration, how to explore data, how to discover features of the data that may remain unnoticed if you don’t visualize them.”
Alberto Cairo is a member of the jury for this year’s Data Journalism Awards. Unfortunately he won’t give me a direct road map to victory, but for students hoping to enter this year (the first year in which a student category has been included), his advice is surely invaluable.
“Steal from the best.
“This is advice I give my students every semester: we all learn by copying other people. By copying I don’t mean plagiarising, but getting inspiration. Look at work from ProPublica or the Washington Post or The New York Times, and copy their style, copy their structure, copy the way they present information.
“Don’t try to think of graphics as if they were embellishments to your story, but as analytical tools and communication tools within your story. They should never be afterthoughts when you’re developing your story. They’re an integral part of your story and an integral part of its communicative power.”
Even with his experience, Cairo says, he still does this himself. Though many journalists seem addicted to credit, and are unlikely ever to admit to anything short of completely original works of genius, in data journalism, collaboration is endemic.
“Nobody works inside of a cocoon,” notes Cairo. “The community of data reporters and investigative reporters is very open. I just came back from the NICAR conference, and some of my students who attended were amazed that they could approach – I don’t know – Scott Klein from ProPublica and ask him questions directly. They believe there’s some sort of hierarchy, but there’s not.”
This lack of hierarchy should lend confidence to aspiring data journalists. To slightly amend Alberto Cairo’s steps to success: all one need do is get a decent grounding in statistics, consult with experts, and join the data journalism ecosystem.
Verifying missile launchers, tracking down ISIS supporters and holding worldwide governments to account is just a day’s work for 36-year-old Eliot Higgins.
Last time I met Higgins, an independent intelligence analyst, he was giving a talk about his work with Bellingcat, the investigative news network he founded in 2012. It was this network that trawled the internet’s vast and polluted reservoir of publicly accessible material to track down the Russian-owned missile-launcher that took down the Malaysia Airlines Flight 17 over Ukraine in 2014.
This time, it’s me doing the tracking, struggling to find Higgins on the hectic roundabout at Old Street station. I eventually spot him standing next to a telephone booth squinting through his glasses, in a Matrix-style black coat.
We only have fifteen minutes but there’s an excitement in Higgins’ eyes as he talks about his work, while easing into his chair. Ordering a Coke, he laughs at how he’s been trying to avoid caffeine.
Bellingcat formed after Higgins’ personal blog, Brown Moses, attracted huge attention as he was able to uncover war atrocities, such as the use of cluster bombs, from the comfort of his home in Leicester. It was the readily available nature of open source tools that prompted Higgins to start Bellingcat and form a network where others were able to learn how to use such tools in their own investigations.
I ask him what his proudest moment is with Bellingcat since his formation. He screws up his face a little, “Hiring people is a lot of fun”, he says. “If it wasn’t for what we did, we would have had this whole narrative of [the] Russian government [claiming to intervene in Syria only to fight ISIS, and not prop up President Bashar al-Assad’s regime] that wouldn’t have been challenged,” he explained.. “And, you know, there are families involved who are being lied to by the Russian government and without us, there would have been no push back.”
“For me, a lot of what we do is about accountability and justice and working with international organisations on that.”
Though the website states it is by and for “citizen investigative journalists”, and many news outlets, including the Financial Times, call its founder a “citizen journalist”, Higgins himself is uneasy about the label.
Shuffling in his seat, he explains: “It’s not citizen journalism. It’s not just about conflict or journalism. It’s about all kinds of different areas. From my perspective, the work we do is not about journalism: it’s about the research and getting it [the findings and tools] to people that can actually do something with it.”
“For me, a lot of what we do is about accountability and justice and working with international organisations on that.”
While Higgins wants to distance Bellingcat from being purely journalistic, the network’s handful of contributors definitely shares a hack’s mind-set, utilising publicly available tools, such as Google Earth and social media, to investigate atrocities abroad.
Three years after the network shot to fame by solving the MH17 mystery, it now covers all corners of the Earth and is fast becoming a force to be reckoned with. This was made clear last November, when Higginsquashed the Russian government’s denials over the bombing of a hospital in Syria. By comparing satellite and on-ground photographs from 2014 to 2016, he was able to show specific areas that were in fact damaged by bombing.
Bellingcat also drew huge media attention after using social media to track down ISIS supporters. Most recently investigators used archived Facebook profile and geo-located social media photos tohunt the Berlin Christmas market attack suspect.
“We thought it would be impossible. Within an hour we had the exact location”
When I ask him about how Bellingcat uses social media in their investigations, he blushes, admitting that they recently caused a “minor panic” in Holland, after the network asked its Twitter followers to geo-locate a photograph found on an online community consisting of ISIS supporters. He laughs, shaking his head as he notices my eyes widen: “It’s nothing urgent or scary. We had one photograph [and] we just wanted to know where it was because it looked like it was in Europe. So we put it out on Twitter, asking if people could help geo-locate it.
“We thought it would be impossible. Within an hour we had the exact location: in a holiday park in Holland. The police showed up at the holiday park and the poor manager had to come out in the middle of the night.”
This brings our conversation to online privacy, as I noted that day he asked his 49k strongTwitter followers about Donald Trump. He says, with a cheeky glint in his eye: “My Twitter page looks like I do a lot online. But if I’m away, I won’t share when I’m actually away. If I post a picture of my time abroad it’s often a week after I’ve actually been there.”
He adds, laughing: “It amazes me that people keep their Instagram profiles public. Who needs likes that much?”
I keep my own settings to myself as he stands up to leave, shaking my hand and plonking the Coke can on the table. At that point, I sadly decide it’s time to change my Instagram settings to private.
The idea of virtual reality isn’t new: Sega made a headset in 1991. VR as a storytelling device, however, is picking up traction in newsrooms and becoming more complex as the medium is explored. How many journalists can say they’ve had a bite of the cherry?
Interhacktives spoke to Charlie Newland and Owain Rich, Producers at the BBC and the filmmakers behind “Trafficked”, BBC World Service’s first foray into virtual reality. The eight-minute film introduces viewers to Maria – a single mother trafficked from Nicaragua to Mexico, and forced into the sex trade.
In an interview with the Interhacktives, Charlie and Owain discussed the story’s inceptions, whether the technical skills they have gained are essential for working journalists, and the first thing everyone says when they remove their headset and return to reality.
What came first: the story or the idea of doing something with VR? O: The journalist that brought the story to us was from Mexico. VR is so new that certain stories work and certain stories don’t, and we’re trying to learn the rules of it. We wanted to find a personal story from a female perspective, and find something appropriate. [Filming in VR is] different to a standard filmmaking, where you’re going out with a camera and you get an idea of what’s actually possible: your levels of access, how you’d actually shoot everything. All these ideas pop into your head and you get an idea of what you’re making as you film it. With VR, however, we didn’t know where the trap doors would be.
“With traditional filmmaking you can choose different camera angles, you can cover shots with audio, you can show the curtain blowing in the breeze, then cut to the door handle etc. But with VR you’re seeing all things around you at all times.”
C: We had a few different ideas. Considering the subject matter with “Trafficked,” it was really tricky because we couldn’t actually show anything explicit. We had lots of ideas before we could choose what we went ahead with and we were still conditioned by what we had access to, so we had to find the best avenue and to drill down on the best way to film
O: We were slightly overwhelmed with the idea of telling Maria’s story. How are you going to get a sense of her whole perspective? Who are you? Are you Maria? Are you a third person? It’s just an endless field of options.
C: You really have to pick where you focus the attention: Is it the fact she’s transported around [or] all of the eight years she suffered through? You try everything out and eventually you see what works.
O: The other difficult thing is choosing what to show and what not to show. Basically, there were certain things that were too graphic to show. With traditional filmmaking, you can choose different camera angles, you can cover shots with audio, you can show the curtain blowing in the breeze, then cut to the door handle etc. But with VR you’re seeing all things around you at all times.
C: VR is laid bare. You can’t hide.
How do you convey the seriousness of the subject best, in VR?
O: We didn’t need to show that much to get that sense of menace, and to connect to people emotionally. We don’t show any violence beyond Maria being struck near the end.
C: Which is what got us our “over 18” certification.
O: We could have shown all sorts of things, but we choose to have the guy just getting slightly too close for comfort [The first scene of the film introduces you to one of the traffickers.]
C: It was the one thing, given the medium, that we could play with: having someone breaking into your personal space, so we really exaggerated his presence.
O: That’s one of the challenges of VR: what to show, what not to show. We’ve got loads of options and the rules aren’t really written. We mocked up scenes, blocked out scenes, and tried lots of different scenes. But in the end, we had to scale things back.
C: We had a dream sequence scene for Maria panned for the start too, and were experimenting with 360 audio, but to tell the story we didn’t need that much. You don’t need to make it super complicated. We did a workshop before we started, and what came out was that it needed to be something that people wanted to watch. It’s not for entertainment, as such, but it needed to be informative. We ran it as an installation — it needed to be an intro to the subject.
“I don’t know whether it [VR] will go mainstream the way that people hope, but I think we would like to be there – just because of how it connects you on an emotional level.”
How important are the typical journalistic values when filmmaking in VR, with regards to ethics and accuracy?
O: Some of the dialogue was rewritten from the transcript to fit the timeframe, but the actual content of what is shown is based on pure fact. We conducted rigorous interviews with Maria herself several times, as well as people involved with her rescue and people who worked on the trafficking route. We had a lot of contextual information to work with.
O: Essentially, what we were making was a dramatised film based on real accounts, so in terms of the narrative, everything was there.
C: Even down to Maria being hit on the head, it was a dramatised version. We gathered as much reference material as possible. When you’re creating something, you don’t want to embellish the facts to make things seem more dramatic and dismal, but the film is still a stylised version of events, factually accurate.
How do you see the industry progressing, and what do you hope your role would be? In an ideal world, would newsrooms invest in these kinds of things?
C: Definitely — I think [VR] offers a new perspective. We forensically recreated a scene and you can walk around, and the emotion combined with the interactivity is definitely something we, as an industry, should be looking at. But it’s all down to the budget.
O: The strategy has lots of pockets of people experimenting. As a filmmaker, anything that gives me a chance to connect with an audience in a different way, that gives a different perspective, makes me pretty excited. I don’t know whether VR will go mainstream the way that people hope, but I think we would like to be there — just because of how it connects you on an emotional level.
How long did “Trafficked” take from start to finish?
C: About three months.
O: In terms of actual hours, it was probably much more than that. it was like six months work in the space of three.
C: The size of the operation and the work needed really depends on the story. If you’re thinking more short form, you can just create an environment, add a voiceover and have a little bit of interactivity. It is quite a scalable medium. Do you want to make a 60-second piece when the viewer has to rig up the whole headset to watch? You do earn an appreciation for both the pitfalls and the shortcuts.
“It’s not about having the coders in one corner and the journalists in another — everyone will have to meld together to make the best stuff.”
What was the feedback like at the installation?
O: People are often most surprised about the emotional aspect.
C: We had a 15 year old girl and a 60-year-old man experiencing the film together, and they were able to take the headset off and have a conversation at the installation. That’s really something. I don’t think there’s been something people have said repetitively, just that it was unexpected.
Have you gained any technical skills during the process? How important are these skills to journalists?
O: If VR does go mainstream, at least a cursory overlapping of knowledge with developers.The gaming industry is a huge industry – the tech has come out of kid’s bedrooms and into newsrooms.
When we started working with the Unreal Engine [ a tool used to create video games] we had to go pretty deep. It’s a whole other world of knowledge. There’s one leap to video, and another to interactive content.
O: And it is the future. It’s not about having the coders in one corner and the journalists in another — everyone will have to meld together to make the best stuff.
C: But should journalists be coming out of school with all the skills to make big VR projects? It’s certainly possible, but I think that it’s still quite unusual. You don’t have to be instantly trained in 3D modelling and sound design, but having an awareness of these skills will help you out in conversations with the developer.
You can read more about the ‘Trafficked’ project here.
This interview has been edited and condensed for clarity.
Megan Lucero has seen it all. The former head of data at The Times and The Sunday Times is now directing her attention to local data journalism as the head of the Local Data Lab at the Bureau of Investigative Journalism. In a (second) discussion with the Interhacktives, Megan talks about her decision to leave The Times, how she envisions the Lab’s future, the importance of collaborative data journalism at a local level.
“I left purely from the idea of it [the Local Data Lab] is really what journalism needs and what data journalism should be contributing to,” she says.
“I left the Times really happy with how we got on there,” she continues. “We brought data investigations into the heart of the newsroom. We went from being a sort of Excel help-desk to actually being integrated into news investigations, big-time front pages […] I was very happy to leave knowing that I was leaving a really strong legacy. But I left because I believe that this is really, really important”.
Megan is referring to the Local Data Lab, an arm of the Bureau of Investigative Journalism, who have set out to “fill the voids” after many newsrooms cut provisions for investigative journalism in the wake of the 2008 economic crisis.“Investigative journalism is expensive: it takes time it takes a lot of people and a lot of [economical] resources,” she explains.
Local newspapers bore the brunt of these cuts, so the “idea was to try to help to solve that,” Megan continues. The Lab will focus on data journalism, which she believes is a form of investigative journalism at its core.For her, data journalism hinges on “the idea that you harness data to find stories, and using data to find stories is in itself journalism.”
Data journalism is an exercise in finding stories in the large amount of data that the digitised world offers today: “the journalist has to be able to swim in a sea of information,” Megan says. “In order to do that, there is often technical innovation that is needed, and that’s where I see the [data journalism] coming in. The computational method — the means in which to programmatically query and build databases, automate the process — all of that comes together in digital journalism.”
One of the Lab’s more challenging tasks, Megan says, will be to bring data journalism to the local level: “It’s an ambitious challenge —it’s a really daunting one — but I absolutely [think] that’s right where my next step had to be.”
“There is a time and moment to listen to our communities and to find out what it is that each member of those communities is trying to say,” she says.
Megan maintains that the introduction of computer-assisted journalism to regional newsrooms will not affect regional reporters: “We want to make sure that we are not trying to put local journalists out. A lot of local journalists use the Office of National Statistics or data.gov.uk We are not going to try to change that because that would potentially harm their jobs,”shesays. “We are after the gaps in the industry, right? Local reporters don’t have the means, the time or the resources to do the computational work […] My team essentially would be coming in to try and provide this”.
Megan insists that the new Lab will focus on unearthing local stories and issues with the help of regional journalists: “There is a time and moment to listen to our communities and to find out what it is that each member of those communities is trying to say,” she says.
How will she carry out this goal? “I am going to listen more than I am going to talk,” she explains. “What are the stories that need to be told at a local level? What are the stories they want to tell? What are the datasets that are not open? What are the challenges to covering local beats?”
She explains that the Lab’s role will empower local newsrooms, and stresses the need for transparency and accountability at a local government level. A failure of local governments to provide information and data surrounding its work is “a problem for democracy and it’s a problem for the free press unless we address it”.
She then discusses her vision for the Lab: “I don’t want to be too prescriptive at this stage, but my goal is to find datasets that are not in the public domain. Already we have a few stories that we want to be bringing out: things that NGOs, charity groups [and]activists have obtained via Freedom of Information requests, a lot of datasets that haven’t actually been brought out nationally or locally. We are going to merge lots of datasets to find analyses that maybe we hadn’t [found before].”
But searching for datasets is not her sole objective, she says. The Lab also hopes to change the way local stories are told and highlighted in the media: instead of having national newspapers dictate the media agenda of the day from the top down, local news outlets will shine: “Our idea is to put the power directly into the local stories, so every dataset that we worked on has to scale on a national [newspaper]”.
The Lab will work together with local newsrooms by collaborating on local issues identified by the partner and discovering whether it might be a story suitable for a national audience. They will do that by using resources not typically available for the local newsroom (it could be complicated datasets that can’t be accessed from a local newspaper, or advanced analysis techniques): “So the idea is that we would scale it [for national coverage]”.
The Lab and the local newspaper will then break the story together, one (the Bureau) as a national news outlet and the other locally. Cooperation is a fundamental journalistic value for Megan: there is little space for competition as “it’s very unlikely that anyone will ever scoop you or steal your story out of the back of it,” she says.
Megan says the creation of the Lab was inspired also by other major nonprofit newsrooms, such as ProPublica. ”Their Electionland project, their data lab, their data stories — they are everything that we were looking to do”, she explains. The Bureau of Investigative Journalism frequently communicated with Scott Klein, ProPublica’s editor, while it was developing the Lab.
Finally, the Lab comes together in one role: “the idea of independent journalism breaking down important complex datasets to very small levels, that’s where we are hoping to do”.
Does the idea of working for the Data Lab interest you? The Bureau will be accepting candidacies until the 1st of February, you can find applications details on Megan’s Medium.
Correction: February 2, 2017 An earlier version of this article misspelled the name of the Lab. It is the Local Data Lab, not the Local News Lab. Some other amendments were done to the quotes in accordance with Megan Lucero.
We love the Internet because it’s a treasure chest of information, and a place where we can join groups to communicate with people who share our interests, no matter how niche the subject or how scattered about the world we might be.
As journalists, social media and online communities are places to discover new stories and trends, find experts and learn more about our readers.
But in the age of political division, trolling and fake news, we can also get frustrated on the Internet sometimes because open platforms — like Facebook and Twitter when left on public settings — can leave us vulnerable to abuse and hatred that which we never signed up for.
With this dichotomy in mind, what should be the relationship between journalism and community? Interhacktives’ Alexandra Ma chatted with Sydette Harry, community lead at the Coral Project, a joint collaboration by Mozilla, the Washington Post and New York Times to provide open source tools for newsrooms to engage with their readers.
.@coralproject: journalists that act solely as broadcasters & not as part of a community risk becoming isolated from readers #mozfest2016
Why do you think communities that allow comments are important for journalism?
Our focus isn’t that every community allows comments, but that some sort of feedback, some sort of interaction is necessary for good journalism. That can be comments. That’s important because it gives you a way to get a more complete and continuous relationship with your readers.
What we are constantly thinking about is how journalism now, more than ever, needs everyone, and good journalism needs to be open and transparent to people, and needs to be verified. The way you do that is by having a continuous dialogue and discourse.
Now that can be comments — we would like it if it were comments — but most importantly, we would like people to really consider what it means to have a community and to plan on it. Because too often people say, “we will let social do it” but they don’t also say, “what does ‘let social do it’ mean?”
What can these other interactions between journalists and readers be, and how can journalists learn from that?
Some of the great research fun that I have is talking to people about the different ways they connect. There is one website I love called Bitter Southerner, which publishes content dedicated to the South [of the US]. They don’t have comments, but you can become a card-carrying “Bitter Southerner”. You can pay to support their journalism and have meetups, get discounts with southern artisanal makers and concerts, and get books — and they have no comments. But it makes being part of the community a tactile thing that focuses on your interests. So if you were concerned enough to be a Bitter Southerner, you get to participate in southern crafts and southern concerts.
It’s not always in comments, but the journalism is supported. The community is created. And with that, people tell them things. People think about things and go, “I’m going to go here first because even if it’s not comments, I know that these people have invested in making connections with me. When I have a story, I talk to people because it’s obvious in the way they have set up the connection to their business. They care about what I think.”
Also, comments aren’t always the best way to get information from people. Marginalised populations, specifically women and people of colour, don’t like comments because comments have historically been so awful and racist. But they will respond more directly to direct solicitations: “Tell me about an experience of racism you’ve had.” “Tell me about your mother’s favourite recipe.” “Tell me about your immigrant stories.” “Tell me about what you are excited about college.”
People will notice suddenly they have so many more comments, so many more interactions. They will even get people to say: “I don’t comment but I don’t mind talking to you.” It’s about opening up the ways and letting people know that you are open to the ways you want to talk to them.
“Why are there so many layers between you [journalists] and your readers?”
Why don’t you think journalists can rely on social media to get feedback and interact more with their readers?
Think about the way we do social media, even as journalists. Sometimes we’re thinking out loud, sometimes we’re super directed, sometimes we might be angry. A lot of that is on social, where we may not always be in our most linear of thoughts and focused. And that’s fine — that’s what people use social for.
But how do you start connecting your readers to each other around similar topics? A lot of that on social now has been things that readers have modified social for, like hashtags. Hashtags were something readers developed to be able to follow conversation and this is all information that lives on social. These important things aren’t on your site. They’re not a relationship you’re building with your community — they are things that you are going to a third party to see and then bringing it back to your platform because your readers are commenting on things that they found on your platform.
Why are there so many layers between you and your readers? Is that what you actually want? Your data is also on a third-party platform — it should be your data. It’s your content. And even though it’s on a third-party platform — Twitter or Facebook or whatever the third-party platform is — readers’ opinions of their experiences on these platforms is transferred to journalism. It’s transferred to the newspaper or the website or the network that they are talking and interacting with. And that’s the kind of relationship we think you [journalists] should own.
The tools of Coral are “Trust”, “Ask” and “Talk”. They handle what we think are three very important sections of community building.
Not everybody should have comments. But we want to improve tools that will allow you to get to the core of it, which is: how do you honestly and transparently represent and provide good journalism, and get good feedback and integrate that into journalism as part of a growing and continued relationship with the community you claim or with the audience you are searching for?
What is the relationship between journalists and readers like now? What would you want to change?
I think it depends on the site, it depends on who you are, it depends on what you feel. I know a lot of people feel that sometimes journalism only shows up when they’re having the worst moment of their lives.
I’m an immigrant from a very tiny South American country, Guyana. People don’t often talk about it, and when they talk about it, it’s usually, “Hey, it’s flooding.” Or that there are lots of deportations, or corruption, or something like that, rather than “oh it’s a beautiful, we have a nearly-1,000-year history and it’s geographically biodiverse, and we have had communities in New York, Philly and Canada for quite some time.”
Journalism doesn’t show up for those things — it shows up for the horror. And people often feel that journalism doesn’t seem to listen. Journalists will say, “I never ever read the comments,” and some of them have perfectly good reasons. Comments have been terrible to them. If you are a person of colour or a woman journalist, comments in some places are horrific for you. They’re utterly horrific and not reading them does you a form of self-preservation.
“A lot of people feel that sometimes journalism only shows up when they’re having the worst moment of their lives.”
On the other hand, there are people who have fallen under fantastic communities from their comments, who have gotten book deals, who have been able to help people with healthcare, who have been able to help people with legal aid, who have supported funerals from the comments.
I remember that a friend introduced me to Bitter Southerner. She’s from Atlanta, I am not Southern in any way, shape or form. I am a first-generation American, so a lot of the South is not personally [related] to me, but I like the way the community [writes], and when they said “hey, you have to pay or we may not survive,” I paid. It was worth it to me to sustain the model.
The community, monetarily, can sustain you. It can also allow — when members are interested and willing to contribute — for different types of fascinating journalism. Bitter Southerner did a wonderful piece about coal refuse that I don’t think I’ve seen anywhere else. They did pieces about the origin of hot fried chicken or music streets that were mostly supported by their community.
One of the big things that we’ve been talking about post-election is the creation of echo chambers and filter bubbles. Is that a risk that comes with building communities?
A filter bubble isn’t a community.
When you think of communities, and when you think of them outside of journalism, we think of them in our own lives. They don’t consist of like-minded people: they’re bonded people, they’re people who have chosen or have, by circumstance, are together, but that doesn’t necessarily make them all the same. One of the big jokes I always make about communities is: think of your last big family dinner.
How well did that go? You’ve got maybe 30 minutes before the lifelong battle between Auntie M and Auntie G came up again because both of them had a glass of wine at decided they wanted to have that fight again. You would prefer that they not have the fight. Somebody will separate them before dishes go flying, and everyone’s doing their job. But it’s reality.
We don’t have to sit here and have discussions where everyone agrees. We do have to have discussions where people’s humanity is respected. We read everything here at Coral. How horribly people talk about each other! Depending on what side of the spectrum you’re on, [the reaction] goes from, “well that was really mean,” to “that is some of the most dehumanising, racist, homophobic language I have ever seen.” And communities, I think, should be for the expression of views where humanity is respected.
“One of the big jokes I always make about communities is: think of your last big family dinner.”
I have my political views. If anyone Googles me for more than two seconds, you can pretty much figure out my political views. But when I step into a journalistic community, I know what is expected of me to behave or how I’m supposed to behave, and I know that the person, even if we are diametrically opposed, is being held to the same expectations. That will be enforced, and the person who has to do that [the moderator] has the tools to do that without harming themselves. And we can present and interact with, at our own will and desire, the sections of our community that best represent that.
Too often, when people talk about the Internet or making a better Internet, they talk about making a “nicer” or “more civil” Internet. I think it’s a good position to have. There are some amazing civil comments doing some really good work with comments.
But I always feel that, for certain spaces, it’s not about whether or not we are civil to each other, or [whether we] necessarily agree on everything. It’s that we know what we expect and can control our experiences.
“A person has a right to be racist. A person has a right to be awful. They do not have a right to make me listen to it.”
The problems with harassment, when it tips over, is that I can no longer control my experiences. I don’t want to talk to this person, but your platform won’t let me not talk to this person. I don’t want this person to see me, but you’re preventing me from saying that. I don’t think this is a real person, I think this is a bot, and I’m doing more work in finding that out than you are.
A person has a right to be racist. A person has a right to be awful. They do not have a right to make me listen to it. And a platform has a right to be racist. They have the right to be for one group only. But they have to be clear about that, and they have to be direct with that. No person should go into a platform expecting one thing and being promised one thing, and getting something completely different, often to the terms of abuse, and not have a way to address that and not have the platform stand by that.
So if I tell you, “we’re not going to have this language,” even if it’s just a social contract and not legally binding, I’ve made you a promise. I should keep that promise. And if I don’t keep that promise, there should be a way for us to talk about why it didn’t happen. That, I think is community — less than “we all have the same filter bubbles.”
Filter bubbles come from the place where we stop trying to develop ways to talk to each other at all. Nobody has figured out who’s supposed to step into that void of “we’re gonna have to talk to each other at some point or we have to at least come to agree on basic facts.”
I have also received pretty bad and scarring comments, so I appreciate the Coral Project’s aims.
I’m a Twitter veteran and some of my harassment has made it into national and international news. It’s really trippy and it’s not fun. I think it’s a thing that we could do better at protecting against. I’m a very large free speech advocate.
I don’t like it when people are banned for speaking what they believe, or saying what they say. I will spend the rest of my life at the top of my lungs, and possibly throw hands if necessary, to fight them about it — but they have a right.
Too often, the idea of “we’re going to push it onto Facebook” or “we’re going to push it onto social” is less about protecting or developing good spaces for conversations, but more about being “I’m not the one responsible for this one. Good luck.”
We are responsible. We are the people who have said: “This is what we want to do: we want to tell the world about itself.” So we have to tell the world about itself truthfully. But we can do that without causing random [access] harm to everyone, and usually to the most marginalised. I believe we can. But I also am very famous for being overly hopeful.
“We are the people who have said: ‘We want to tell the world about itself.'”
What is the journalist’s role in online communities? Are they community members, are they also contributing, are they asking the questions? What’s their role?
Journalists are all sorts of things. People are using our tools, which is very exciting. In one of our tools, created with Bocoup, you can choose an emoticon and one thing you want the president-elect to concentrate on. And they have been getting good with that.
We [Coral Project] have a community online, we make newsletters, we go to conferences and we talk to people. We also counsel. In trying to build around a community, we also hope to form a community of people who are like, “you know what, we want to talk about this. Usually people don’t think it’s important but we’re going to think it’s important.”
There are some journalists who look at us and say, “you are very sweet, I’m never going to use this.” And that’s fine. That is OK. But we want that to be a discussion we’re having, and not just a quiet thing where we’re going, “comments are terrible, we’re not going to do anything about it” or “comments are terrible, we’re not going to talk about how they got that way.”
Journalism is so important, now more than ever. We want to work toward bringing people back to interacting and trusting journalism, because they know that this is part of their lives.
Do you think journalists are losing trust in their readers?
I think readers are losing trust in us [journalists]. We have numbers on it. They don’t trust us. And part of that is because they don’t know us, or they don’t know what we do.
There are ways for us to be connected to our readers and inform our readers and do the intelligent and vital work of journalism without being so disconnected. People are like, “once I give people what they want, I’ll have to give them simple, bad journalism.” I don’t think that’s true. I think you can give people a connection so that they trust the journalism you give without having to dumb it down.
This interview has been edited and condensed for clarity.
TOP IMAGE: The Coral Project distributed this sticker at their workshop at MozFest 2016.
Sending Freedom of Information (FOI) requests can be a daunting job for UK journalists. From the early stages of drafting your request, the risk of not getting the information you need can feel like a minefield.
However, thanks to WhatDoTheyKnow (WDTK), this task has been made a little easier. WDTK, an open platform website powered by MySociety, a charity aimed at promoting online democracy, helps users with requests and makes past requests publicly available.
In this interview, Interhacktives’ Ella Wilks-Harper speaks with Louise Crow, MySociety’s senior developer, about the future of FOIs, government secrecy and open data. Crow opens up about the drawbacks of open data and how governments are releasing large quantities of information that are not of high quality. However, with the help of WhatDoTheyKnow, technology is playing a growing role in helping ordinary people become more data literate; a positive step in holding organisations to account.
How did you get involved in MySociety?
About seven years ago, I came back from living and working in the States, and was looking for something useful to do. MySociety was advertising for people with coding and a certain amount of time, and I had both of them.
Do you think there are limitations of FOIs when people do not have the skills to analyse the returned information or data?
FOIs are used by lots of different kinds of people. I think one of the benefits of having the [government’s] responses [to FOI requests] responses in the open is that anyone can come and interpret the information that has been released. A lot more information is going out into the open.
David Cameron had a vision that your everyday person without any coding skills — otherwise known as an “armchair auditor” — could hold the public sector to account. Do you think they exist?
Almost anything exists. I guess a more important question is whether they exist in significant amounts to hold power to account. I think that’s always going to be difficult, and any solution that is based on data only is going to be a naïve solution. You have to have a look at the nature of power to try and put checks on power, if anyone is to have a chance at holding power to account.
Technology can help. I think what we do at MySociety tries to help the little shift in that balance by making the technology available to ordinary people very good. But I don’t think it can be the only solution.
Journalists like yourself, and having a strong and free press, are a strong opposition to government. All of that stuff has to work together.
What are the other solutions?
I guess it is the scope of civil societies. Journalists like yourself, and having a strong and free press, are a strong opposition to government. All of that stuff has to work together.
Have you noticed any chances in the handling of FOIs and open data in the last five years?
It is hard to say as I have only be working on this project since 2012. Within the last five years, I think the difficulty with open data at the moment or one difficulty with open data at the moment is that it is not driven by demand.
If you look at FOI and open data together, open data is what authorities think people want to know — or, in a more cynical interpretation, what they want to release. So that maybe they [public bodies] don’t like to release data, so it’s very old or they like to release success stories so it’s a specific subset.
Technology can help. I think what we do at MySociety tries to help the little shift in that balance by making the technology available to ordinary people very good.
Do you think FOIs is a stepping-stone for greater transparency?
I think FOIs are definitely one positive step, and the right to access laws* is a really heartening step and to the extent to which technology can help. I don’t kid myself that it’s the only part, but I think that’s worthwhile in terms of transparency.
*[This right of subject access means that you can make a request under the Data Protection Act to any organisation processing your personal data.]
Do you think there is a movement in helping people become more data literate and be more engaged with the vast amounts of data out there?
I think that is one of the big challenges of the modern age: information overload and information provenance. We have seen in political happenings in the last few months this question of [how] you can get lots of information but whether it is of high quality is hard to find out. Google and big platforms have a big role to play there.
At MozFest you spoke about being neutral in your position working in MySociety. Can you expand?
I think a very practical answer to that question is MySociety is a charity. As a charity, we are legally obliged to be politically neutral. That’s kind of what right to information is all about. It is about something that applies for everyone, whether you agree with them or not.
And that has to be a principle across all the rights [right to know, right to information and press freedom] , because life changes and lots of people are going to be using them. Like freedom of speech, it can’t just be something when you say something that I like to hear.
And finally, do you send out FOIs yourself?
You know, I have never sent out one. To be honest, there has never been something yet where I’ve felt like I really need to know this. [But] I have certainly benefited from reading other information that people have requested.
Peter Yeung: Society will gain greater benefit from just having open data.
Fresh from his time as an ‘interhacktive’, the eyes of the data journalism community are already looking towards Peter Yeung at The Times. With several front page exclusives already under his belt, Peter has already caused quite a stir with his innovative work on mental health and the proposed 3rd runway at Heathrow. Dedicated, pioneering and oozing coolness, he spoke to James Somper about data, secrecy and vine.
Why did you choose data journalism?
I didn’t originally start in data journalism. I did some work in culture journalism before actually- visual arts, film and music. Only fairly recently did I start doing data journalism in the last year or so.
Why did you make the transition?
It’s been a growing niche. I studied anthropology before where you can specialise about specialising. Likewise, with data journalism you can cover almost anything. You have a lot freedom even though in some ways it is sort of a specialism to an extent now.
What do you think the difference is between a data journalist and a data analyst who works for a think-tank?
Data journalism works in the classic tenants of journalism in analysing and contextualising and providing information in a context that’s understandable for readers, whatever publication that might be.
How do you think Brexit will impact upon data journalism?
Well, not necessarily in an obvious way. I suppose if you take a broader approach and say it’ll have a negative impact like it may have on the rest of society, data journalism is a community that relies on the sharing of information and an open ethos more generally. If there are more limits on that then it could be pretty tricky. There’s a big Irish data journalism contingent in the UK which could be affected and that’d be a real shame.
As a data journalist, would you say you’re being hindered by the amount of data that the government releases or do you not think it’s a problem?
Like every data journalist, I like the idea of open data and that all data should be collated and collected, as much as possible. Freedom of Information requests are useful. Essentially society will gain greater benefit from just having open data so yes, the government is hindering journalism and society by releasing all data that it collects.
Would you say as a country we are becoming more secretive?
Data literacy and the quality of data output is increasing. I know the ONS recently had quite a scathing self-assessment about the quality of statistics they were putting out. Over the years it has improved quite dramatically and now it’s pretty good quality. With that being said, there are circumstances where it might be a very political thing in terms of which government is incumbent but there is a lot of difficulty in some cases about getting hold of data, whether that be through Freedom of Information requests. Certain government departments are very difficult to release data that should be released publicly. For example the Department for Education have become a lot more stringent on the way that you are able to access the data that they produce and they have also introduced a stipulation a couple of weeks ago that you have to give the government at least 2 days notice that you’ll be publishing data that they’ve released. That’s quite damaging to analytical newspaper journalism. In terms of secrecy, it depends on which angle you want to approach that from, whether it’s in terms of secrecy on the part of users who are now more suspicious about the way there data is being dealt with. There is a much greater awareness with Edward Snowden about the way our data is being used, but actually, news practices use huge amounts of data to check companies like Facebook.
What advice would you have for wannabe data journalists?
There is a lot of negativity about the journalism industry. It’s always been difficult to find your dream job where you could have loads of time to do what you wanted and have a lot of luxury. It’s more about finding your way in and using the spare time that you have to focus on the topics that interest you and what you want to write about. In terms of data journalism jobs, there’s quite a lot.
Finally, why did Vine fail?
I think the main reason it’s been shut down is because of the conflicting situation of Twitter and also how in a way it was also a competitor to Twitter’s ambitions in video. I think if it had continued to go as an independent, individual company, it would still be around. All the tributes that have flooded since it’s closed show how popular it was. It was probably the purchase by Twitter that killed it.
Analytics are some of the most effective tools publishers have for distributing stories. Yet, implementing analytics and tailoring them to an organizations specific needs has proved challenging for many news rooms.
We spoke to Federica Cherubini, a media consultant and editorial researcher who worked for the World Association of Newspapers and News Publishers (WAN-IFRA) in Paris.
Together with Rasmus Kleis Nielsen she authored the Reuters Institute for the Study of Journalism’s report ‘Editorial analytics: How news media are developing and using audience data and metrics.’
Cherubini, and her co-authors, conducted 30 interviews across eight countries to uncover how newsrooms are working with analytics.
How can work in a newsroom be affected by the use of metrics and analytics?
Nowadays, tools and ways to get data information are common in a newsroom. It is very typical to see a big screen with real time traffic in a newsroom. What publishers are now trying to develop are best practices to try to use analytics, not just in the distribution process after an article has been pushed out on different platforms, but also to perform and produce better journalism.
Which examples of best practices did you find out during your research?
We defined best practice as those who use editorial analytics instead of generic and rudimental use of analytics, that means analytics are tailored to each news organization and the newsrooms decide what they want to look at.
Currently some of the most popular metrics are: time spent on a page, the number of shares, retweets, and comments to see how the users interact with the content.
If you produce a piece of content where do you want to publish it? Is it a piece suitable for Facebook or other platforms? The problem that many newsrooms have with analytics is that they look at data as just numbers which don’t mean very much. The newsrooms that use best practices are those that give numbers a context.
Will analytics change editorial decisions?
No, because if a journalist or editor decides to write on a topic that is important to write about, they will. The main thing is that data will never replace your own judgment, data only helps you be more effective.
Chris Moran, audience editor at the Guardian, always says that it is important to decide when you can publish your article. How you change your publishing schedule still reflects a print mind-set, you can use the data to inform that decision and be more effective.
Are there any weaknesses in what newsrooms are doing with analytics?
Many newsrooms are a bit generic and basic. They gather the data, they share the data, maybe the journalist gets an email everyday with their performances of the day before, but that’s it. So one weakness is not really trying to turn it into actionable insight.
Another weakness is not just for the newsroom, but it is very difficult to track data across devices or across platforms. So is a share on Facebook the same as a Tweet? Does it have the same impact or value? So trying to understand how the data in different mediums translate to different platforms.
Where do you think we are going now in terms of data and analytics, all this stuff that is new for old school style journalists?
I think newsrooms are getting more sophisticated. But they need to understand that one approach doesn’t exist. There is no one set of tricks you just learn and you’re done.
I really think it should be focused and tailored on each news organization. Otherwise its tricks to improve the headlines and getting more reach. Pure reach, irresponsible reach, doesn’t get you anywhere, doesn’t mean that the reader is going to come back.
Reach, or being big, isn’t enough anymore. The next question is about how you turn your audience into a loyal audience.
And metrics taps into that in helping you have a bit more information and to test hypothesis in the newsroom. You can experiment go back and look at the data and see if it worked. If it didn’t you can change your approach the next day.
Federica Cherubini currently works with WAN-IFRA on engagement strategies and editorial conference planning.
Hailing from a tiny Californian town, where the main mode of transport takes the literal measurement of horse power, Megan Lucero is quite the outlier. The energetic 27-year-old – who was remarkably promoted from intern to data editor at The Times and The Sunday Times in just four years – would certainly stand out if you found her in a spreadsheet. At their shimmering Thames-side offices, Lucero talked to Peter Yeung about the importance of open data, the inherent plurality in data teams, and how her paper was the only one to correctly reject the polling data about the UK’s 2015 General Election.
Can you talk about your rise through the ranks at The Times?
I was interning for a week on the foreign desk, and I was just finishing up my MA in International Journalism at City University. It was my first time in a massive newsroom, which is funny to look back on now. Towards the end of that, I started taking a lot more on for the desk, and suggesting a lot more we could be doing digitally. I was very fortunate that at the time Richard Beeston– who unfortunately passed away a couple of years ago – was very on board with this and gave me a lot of free reign to do that. But at the end of that, they were cutting researchers and my job came up for the axe. I went up to the editor and deputy editor at the time, James Harding and Keith Blackmore, and I pitched a job to them, which we later called a “story producer”. I invented the job and they said: “Let’s see what you can do”. It was a one-woman show for a year. I taught myself a bit of coding, built some interactives, I started our first Soundcloud account doing podcasts, I was running live-blogs and finding stories online. But another review came up. They asked me to apply as a data journalist, and at the time I didn’t feel qualified by any means. But once I took on the role, I wanted to own what data journalism meant, so I just started teaching myself. I was trying to learn as fast as I possibly could because I saw this as a massive opportunity – the future of journalism. After a while, I basically running the team, so I became data editor.
What role will data journalism play in future newsrooms?
One day data journalism won’t be data journalism – it’s just going to be journalism. This sexy term that everybody throws around will disappear. Every single journalist should and will be digging into all of the digitisation and data around their beat, finding their own exclusives. I think there will always be a specialised team that will need to help with really advanced machine learning, perhaps algorithms that look at modelling, but I really think that data journalism as we know it now won’t exist.
Do you ever need to convince others about the value of data journalism?
I’ve worked pretty hard to make sure that other journalists understand the value of it. We had a front page exclusive out about charities’ expenses recently, and every journalist knew exactly that the story was a combination of a data journalism approach and an investigative approach. Everyone recognises the value of data, but it’s a matter of whether they’d be equipped to do it themselves. Sure, there is a gap with what other journalists can currently do, but they still recognise that data is important and valuable.
Does the paywall affect The Times‘ approach to data journalism?
If there was a paywall, or there wasn’t, we would be approaching it as we do. If anything, there’s much more of an argument for what we do at The Times, because our business model is that we produce news worth paying for. You’re trying to give your audience and your reader something exclusive, something they can’t get anywhere else, something that is worth subscriptions. A lot of people are willing to pay to support foreign correspondents around the world, advanced sports coverage, access to premiere clips. And I think that there’s a value in someone who’s looking out for accountability in public interest reporting, by advancing data manipulation and data analysis. I think every journalist should be thinking about how they can tell the full picture, looking at all of the information available. If you shut the door on data journalism, or limit yourself on how to access data, you’re really limiting the depth of what your story can tell.
Are there ever clashes between the editorial stances of a paper and what the data says?
I think your question doesn’t even necessarily need to apply to journalism. If you look at academics, if you look at anyone who analyses data, they can tell you that it’s possible to torture a data set to tell you whatever you want it to say. You’ll read one study that says drinking red wine helps you, you’ll read another that says it will kill you. This is because people twist numbers and they will twist it to tell you want they want. But I think we’ve never been pressured to deliver a certain angle, or to intentionally twist the data.The great thing about having a data team is that you’re not relying solely on a single individual – a team requires, for us, a peer review. Each of us check each other’s processes, we really do make a moral and ethical decision whenever we’re looking at it. We try to be open and challenge each other if we find ourselves if we going down a certain angle, or not doing something as robust as it should be. The classic example is how we treated the 2015 General Election – we rejected the polling data that was in front of us – no other paper did that. It wasn’t robust, the margins were too wide, the data was skewed. That couldn’t have happened if it was just individual people going after a story.
What is more valuable, open data or freedom of information?
If there was truly open data, you wouldn’t need FOIs. If truly every government body and every organisation that is public, opened their data, you wouldn’t need to do that to begin with. The fact that FOI is under threat is a travesty, and it’s absolutely unacceptable, because this is an affront to a public service. This is a right being taken away from citizens. But if you look at the source of the problem, it is that the data isn’t open. It’s the fact that public information should be easily accessible and it should be able to be accessed. My argument would be that open data is more important, because it is the bigger picture that encompasses FOI issues. But, of course, I wouldn’t say that FOI doesn’t matter – it matters a lot. It was created because of the lack of transparency and the lack of openness. But hopefully we can get to a space where that won’t really be necessary.
Is it difficult working for both The Times and The Sunday Times, which are competing papers?
We’re the only editorial team that does this. There’s no one else who has a data team that works across two titles. It’s kind of like contracting, in that sense, but it doesn’t feel like that here, it definitely feels like two separate titles. We’re quite lucky that there’s very different focusses on what we do for each title – what we can bring to them. But at times, there’s obviously data that both titles will want, and it would be quite silly to replicate our work. But I think we’ve been finding a good balance in how we share that. Luckily, the way that data journalism works across the board is that it’s quite an open space and an open community –The Guardian, The FT– I know the editorial teams across the board here. Most of all try to open up our data. If I did something for The Times, it would be quite natural for us to open up our FOI requests and the data on that story. That’s what is quite unique about the data community. But it is challenging.
What do you want The Times data team be known for?
I’d love to expand my team even more as I get more resources, and as that’s allocated to us. Basically: I want our team to continually be breaking really great stories, and we want to be doing it in a way in which you couldn’t be doing without computing. Our team is really is brought in to be an investigative team, and we find our best use is when we are doing advanced algorithms, machine learning, modelling – when we’re handling big data, doing things that a human really couldn’t do without computing. That what I want to be known for. We’re still kind of working in an area in which we’re doing some journalism that other journalists could do, so I’d like it to really move further along that line. Doping is one of the biggest examples, but obviously we’ve done a lot of stuff on charity finances, on footballers’ accounts. I’d like to continue that, and I’d like us to get more into visualisation – our team doesn’t do enough due to resources – and I want to focus on stories. But also I’d like to help contribute to the data community and to this paper about creating those journalists that are empowered to be data journalists themselves.
This interview has been edited for clarity and brevity.
At the tender age of 27, John Burn-Murdoch is one of the leading young lights of data journalism in the UK. His brief career to date has already taken in The Guardian, The Telegraph, Which? Magazine, and The Financial Times, where he’s been working in a coveted data journalist role since 2013. Raised in Yorkshire, Burn-Murdoch also channels his passion for spreadsheets and statistics as a visiting lecturer at London’s City University, sculpting the next generation of data enthusiasts. On a crisp December afternoon at Borough Market, he talked to Peter Yeung about the issue of objectivity in data, the risk of cronyism in the data journalism community, and how the FT are unique.
How does the FT differ from other publications?
I would say our newsroom is more numerate than most. That’s not to say everyone has a maths degree, but we have some people that have previously worked as analysts in banks, for example. That means a lot of the day to day data journalism, the quick-fire stuff, is handled by the reporters without them even thinking about it. You could say that a lot of the numerate data journalism that comes out of the FT won’t even come past our desk – it just happens. That means that those of us on the data, stats and interactive teams are afforded a bit more time to dig deeper into things. We might have more of a week-scale publication schedule, with some quicker articles in between. Whereas places like The Guardian publish two or three pieces on the data blog on any given day. There are different ways of doing things, but most of it for us is having the capacity to take a little bit more time.
How did you get into data journalism?
I never really thought about journalism until the third year of my undergraduate geography degree. I was a bit disillusioned with the course, and needed to do something extra-curricular. I started working on the student paper and really enjoyed it. I didn’t even know data journalism was a thing then. My first taste of professional journalism was doing some work experience at The Guardian during the London riots, because they were suddenly looking for lots of people to come in and do some research. That was inherently data-related work. Then I started a Master’s in Data Science [the first in the UK] at Dundee University, but I only did the first term of that because it was impossible to fit in, since I was working full time. It was distance learning, but there was also a four week period of intense sessions in Dundee. I absolutely loved it, but it proved too much of a struggle with time.
Why are you a lecturer at City University?
I’d say for two reasons. Number one, as trite as it sounds: giving something back. When I was studying at City, James Ball was lecturing at the same time as being a data journalist at The Guardian. And with something like data journalism, which is quite a rapidly evolving field, often it’s better to have someone who’s actively involved in the field teaching it. City got in touch, and there weren’t many data journalists in London to be honest, and I was obviously one of the one’s they knew, and it sort of ran from there. The other bonus for me is that it keeps my own skill sets ticking over. I kind of feel like everyone wins.
The data journalism community is quite tight-knit. What are the advantages and drawbacks?
I think it’s mainly an advantage. There are obvious drawbacks in terms of cronyism and when people are interviewed for jobs there’s always a temptation to hire the people that are familiar. But I think there are big advantages in terms of collaboration: digital journalism as a whole, and especially it seems anything where data analysis and web development are involved, seems to be inherently very collaborative. The whole concept of open source is about riffing on other people’s work, taking something someone else has done and adding to it. That collaborative spirit is a massive help. Without it, we wouldn’t move along as quickly. But as a counterpoint to the cronyism, because of the skill sets now required we are now seeing a lot of people from outside of that bubble. If anything, data journalism is less cronyistic than journalism as a whole.
With its history in computer-assisted reporting, data journalism has tended to be focussed on investigations. But should there be more quick, reactive data journalism?
There are obviously lots of cases where you can do good quality data journalism very quickly. Alberto Nardelli is one of the best at using a quantitative mindset and skill set, but with the breaking news agenda. But, having said that, I think inherently the best data journalism, if you judge it in terms of the level of analysis, and the ability to find a news line that other people don’t have, takes time. Quick-reaction pieces can only be done if you spend a hell of a lot of time familiarising yourself with your beat, and building up your data sources. It’s definitely possible, but to be knocking out really top level data journalism multiple times a week is really difficult.
Is data journalism more objective than other forms of journalism?
Like those that have answered it before me, and probably much better, I would say it’s not necessarily more objective. It certainly can be. Data journalism can be more objective than vox pop journalism, purely because of things like sample sizes. When you’re trying to extrapolate and talk about national or global trends, you can be more objective. But there are issues with data quality, and issues because your starting point is always a question you want to answer. There are few journalists of any type who start with a completely naive position. Some people might have an agenda even though it’s a completely unconscious one. It’s very difficult to go in completely blind.
How do you establish the line between pushing an agenda and finding a story?
The obvious one is talking to people in the know, especially those who you think might disagree with you. If you set out to ask a question of a data set and you go to an expert who has already written extensively with the same angle as you, it won’t help much. Even before that, you can do your own tests by interpreting the data in different ways, making sure there aren’t any other counter-explanations in there.
Who is doing the most interesting data journalism right now?
That’s a really tough question. Obviously, the FT. No, no: all sorts of places. There are some obvious ones such as the New York Times, which does pure data journalism and visual journalism, constantly raising the bar and doing fantastic stuff. Berlin Morgenpost won the Information is Beautiful Award for being the best data visualisation team, and they do some amazing stuff. Pro Publica, with their data-driven investigations, do incredible work. Bloomberg have been doing some amazing visual work recently and the same goes for the Wall Street Journal. The good thing is that I’m having to think a lot harder than I would say five years ago, when I could have reeled off two or three and there wasn’t anyone else.
If you could lead your own data team, what would it be like?
I’ve never actually thought about that, it probably speaks to a lack of ambition. Personally, I like the idea of a team of specialist-generalists, so people whose skill is both technically and in terms of subject matter having interests and skills in all areas. Kind of like it is at the FT now – one week we are working on climate change, the next it’s Boko Haram terror attacks, then maybe it’s something about the global oil trade, and then something on tennis. You want everyone to have a base level of technical experience, but it’s always nice when people are pulling in their own directions to a certain extent. For me the team would be two-thirds coming up with ideas generated internally, and the other third doing amazing collaborations with other parts of the news room. Very roughly speaking, that’s what I’d look for.
This interview has been edited for brevity and clarity.
In 2008, Nicolas Kayser-Bril, a young graduate in media economics, fell into data journalism by chance because he could code simple stuff. He began his career by publishing stories with Le Monde and The Post ( the previous version of the Huffington Post in France). In 2010, he was part of the team at OWNI, a French digital think tank, that analysed the Afghanistan war logs.He is now his own CEO at the data-driven agency Journalism ++. The highly accomplished data mastermind talked to Cristina Matamoros about the state of data journalism in France.
There was a story of major importance that was run by The New York Times, The Guardian, and Der Spiegel, and a lot of media outlets realised they were left out of the story because they didn’t have people who could read SQL files. At OWNI, there were people who could do that, so I wrote the French version of the story with Pierre Romera and that’s pretty much when people realised that data journalism was a thing.
Which company pioneered the usage of data journalism in France?
In France, it was definitely OWNI. I don’t think any newspaper or news organisation in France has made much progress in data journalism. Lots of things have been tried like Les Décodeurs from Le Monde, they’re a fantastic team. At Libération you have a new team, at le Parisien they have something as well. You have great things going on everywhere, but I don’t see any real data journalism team in the sense that you don’t have developers or designers as official teams as you see in Switzerland, Germany, and pretty much everywhere else in Europe, you don’t have that in France.
Why isn’t that the case in France compared to the UK?
In the UK, it’s the same situation as in France, in my opinion, in the sense that you don’t have news organisations driven by profit – the Guardian and the BBC are different – but other news organisations don’t see a return in investing in research and development. And this explains why you don’t have teams in France like you might have in Germany. This being said, there are many more interesting things in London than in Paris. One reason for that is because people who studied humanities drive journalism in France. So you couldn’t find a statistician in a French newsroom. So it’s much harder for French media.
If you were to direct the editorial team at Le Monde, what steps would you take to develop a data journalism team?
I wouldn’t, because the owner of Le Monde is not interested in profit. That said, creating a data journalism team is pretty easy; you just need a project manager, a journalist, and a designer, and have them work together. So it’s not that hard – it’s just that the French managers haven’t done it yet.
If you look at the ownership of local newspapers in France, you realise huge corporations mostly own them. And they have no interest at all in innovating journalism. What they really want is for the newspapers to do as little investigation as possible.
What is the advantage of doing local data journalism?
Nothing specific – it’s the same as doing data journalism at the local or national level. It allows better and more efficient reporting.
You have a lot of brilliant people in France, so you just need to find them and provide them with an environment where they can try things out.
And managers need to understand the need for investment in promising fields. But as long as these two conditions aren’t there, nothing is going to change.
The eyes of the British media are watching Malcolm Coles, the Telegraph’s Director of Digital Media. He is at the heart of the huge disruption taking hold of the newspaper industry and the battle to make journalism sustainable. Coles launched Trinity Mirror’s nimble data project Ampp3d in December 2013 to widespread acclaim, and now he is tasked with transforming The Telegraph into “a digital-first media newsroom”. Serene yet steely, he spoke to Peter Yeung in a colourful modernist corner of the newspaper’s Victoria headquarters.
One is improving the standard of our digital publishing. So, trying to make how we write about thingsmore suitable for the digital age in terms of interactivity, visuals, and background explainers. For instance, there’s an editorial development team that reports to me, who work on exciting projects. There’s a new formats team, who are tasked with new ways of displaying things – a whole new ecosystem of explainer cards and timelines and responsive infographic grids are here.
For the other half of it, I manage the teams that focus on audience. 90 million monthly unique users is the new normal for us every month, and we’re heading for 40 million UK unique users. Like a tree falling in a forest, if there’s no one around to hear it, does it make a noise? Likewise, journalism is only really good journalism if it has an impact.
Did you always envision a career in journalism?
No I didn’t. In fact, I decided that I didn’t want to go into news journalism when I left university because I couldn’t quite bear the idea of knocking on people’s doors when their offspring had died. For all that it is – a very important part of local journalism – it still wasn’t really for me. After working at a number of other places, such as [consumer advice charity] Which? and [website consultancy] Digital Sparkle, and writing about newspapers on my blog, which involved the Daily Mirror trying to sue me for libel, then they rang me up and said: “You’re so clever, why don’t you come and work here instead?” So, I did.
Where is The Telegraph going – will the metered paywall be around forever?
The Daily Telegraph has never had a bigger reach than it has right now. More people read it than ever before in history. But we all know that digitally it’s hard to make money. There are lots of people competing for that money, including new platforms like Facebook and Google. Martin Sorrell, CEO of advertising giant WPP, has come out and said he thinks paywalls are the way for publishers to go. On the other hand, you have The Sun coming out from behind its paywall because it couldn’t make it work.
The Guardian gives its content away for free because it doesn’t think it can get people to pay for it. Read into that what you will. The Telegraph, in some ways, has a cap on engagement, but despite that we’re still growing. I think we’re up about 20% year on year in terms of unique users and I think we had record traffic numbers this year, but May 2015 was a bit of an anomaly with election taking us past 100 million. The Guardian has twice as many journalists and loses lots of money every year, The Telegraph is profitable. I think we probably punch above our weight in respect of all that.
October’s ABCs show that The Guardian are now at 8,370,243 uniques (+11.28%) and The Telegraph at 4,419,480 (+0.11%)
The Telegraph are currently hiring a new data editor – was this your decision and do you think data is a requirement of the modern journalist?
I think I probably filled out the document. The core of journalism is still itself the same as it ever was, but how you find some of those things is a bit different and how you display them is very different these days. There have been a number of new hires this year. As I say, we set up a development team on the editorial floor, who are busy building reusable format stuff and one-off interactives. But yes, data journalism is a different way of uncovering stories and thinking in different ways to visualise them. I’m sure data will increase at The Telegraph, but it is just one facet of what we do.
I still think you will get specialists in many areas, on the other hand, general journalism is a bit more of being able to everything these days. People are expected to self-publish, to think about SEO and social, and the home page. They’re expected to find all different bits like galleries and videos and assemble them. So there’s a lot more thinking about how to best display your story online than there was 50 years ago with the old workflows.
Why did Ampp3d, UsVsTh3m, and Row Zed not work out?
Is the lack of popular, tabloid data journalism an issue?
There are lots of stories buried in the data. It’s important that data journalism happens, because otherwise you don’t find out these things. But I think there’s more data journalism now than there ever has been. It depends what you mean by data journalism really. If you open up a tabloid, there is usually some sort of infographic in there with numbers. Trinity Mirror still has a data unit though, and they are still working on data journalism all the time.
SEO and Social are now up to 70%. Will it always be so important?
Parse.ly did announce that for their network of publishers social overtook search. But The Telegraph still does have a very strong line of traffic to its home page because people want to know what our view on the world. They’ve been significant drivers of growth for most publishers over the years, and they’re not going to go away. Obviously, Google and Facebook have expressed dissatisfaction with how web pages render in their environments. But both those brands know that people go to their platforms in order to find things out — there’s a reason people follow news brands of Facebook. It’s not in either of their interests to stop news being findable to the scale it is today, but we can also bicker about the share of the advertising pie we get.
News organisations announcing only now that half their traffic is mobile are late to the party. — Malcolm Coles (@malcolmcoles) October 31, 2014
Where will The Telegraph be in 10 years?
At the start of this year, I assumed a responsive website was a terribly important thing. But now with mobile web pages we are heading onto Facebook Instant Articles and Google AMP. Ten years is a very long way to look ahead. I imagine there is a bunch of people at school now, who by the time they get to the end of secondary school, virtual reality will be second nature for them. For me, it will always be a weird, alien thing. I’m sure there will be virtual reality ways of accessing digital journalism and I’m sure we’ll spend a lot of money working out how to do it right and then no doubt some virtual reality platform will come and attempt to aggregate us all. Let’s hope we’ve all learnt our lessons from the mobile web for that. There’s no way I’d have predicted the end of 2015 at the beginning of the year, so I’ve no bloody idea what’s going to happen in 2025. But I’m sure cat GIFs will still be important.
Billy Ehrenberg, ex-Interhacktive and data journalist, has spent the last year working on new data-based projects with City A.M.’s expanding online team.
I caught up with him to ask what his role involves, and what he sees as the future of data journalism.
In his average day, he admitted that he doesn’t do as much data as he’d like.
“There is a common misconception that graphs in stories means that it’s data – but I try to get at least one data piece done a day.
“Some of what I do is trying to find a story in the numbers, but often the story is quite obvious or easy to tease out, and I need to use visuals or explanations to make it accessible and interesting. To do this I use a few different tools.”
“Excel, Google Sheets, QGIS, CartoDB, HighCharts, Quartz Chartbuilder, Outwit Hub, Illustrator – each one has their advantages”
Billy has several different favourite data tools depending on the job at hand. For example, he says he usually prefers Excel for cleaning datasets.
“I’ve used Open Refine a bit, and that’s certainly worth getting into. Excel and Google Sheets have a bunch of functions that let you pull data apart and whip it into shape – so how useful Excel is depends mostly on if you’re boring enough to have fiddled with functions for days on end.”
On what he sees as the future of data journalism, Billy reckons that “it will naturally divide between real data and fake data. You see some people who do things like not adjusting historic financial data (even film revenues) for inflation because they are in a rush or just don’t realise they should. That’s a dangerous thing: people can see a graph or chart and think that what it shows is fact, when it’s as easily manipulated or screwed up as words are.”
“That’s a dangerous thing: people can see a graph or chart and think that what it shows is fact, when it’s as easily manipulated or screwed up as words are.”
“I think you’ll get two sets of people: those who do not do a lot else, with big skillsets like coding, stats, cartography and programming, and those who have to rush out faux data for hits.”
The next ‘hot topic’
Billy told me he’s not sure what the next hot topic is, but he think it’ll be related to coding – “maybe it’s a cop out, as it’s nothing new.
“People wonder if it’s worth coding if you’re a journalist, and even if you are a journalist if you code. I’m obviously pro-learning.”
“It’s really important to try not to mislead people. Graphics are easy to use to manipulate people. The more complex they are, the more likely you are to mess up and the less likely it is anyone will notice, even if it changes something.”
“Visualising ethically is important too: even the colours on a map or the extents of an axis can make a change look hugely dramatic”
“I try to let the data tell the story as much as I can and if I don’t like what it’s saying I won’t change the message.”
When asked what data-related skill he wishes he could master, Billy said: “it’s got to be D3. It’s so difficult that I get a real buzz out of solving something in it, even if it’s taken hours.
If you’re someone who loves ideas, projects and discovery, you’ll be right at home in the new social network that’s currently creating buzz online. Capioca (Cap-ee-oh-kuh) is a website designed for people to collect things that fascinate them, and to find and discuss new ideas. It was envisioned by its founders, Rebecca Findley and Byron Wong, as an online version of a coffee house in Samuel Pepys’ London: a thriving hub of learning, discovery and discussion.
Discovering niche ideas
“We didn’t set out to create a social media site,” Rebecca Findley tells Interhacktives. “Capioca was a side project that developed over time. As well as a place for people to find their interests and post what they know and love, it’s for discovering new, niche ideas.” She confesses to having always had a passion for connecting people, both professionally and personally. “I even sent my mum on a date with the deputy editor of the first newspaper I worked at! They’re now happily married.”
Rebecca’s background working as a newspaper journalist influences her approach to creating a social network, especially the ‘Editor’s Picks’ section, which is a mix of content that the site’s administrators love. “Many people come to the site just to see our Picks, which we didn’t expect,” says Rebecca. “It’s great to share content with an angle that means a member can enjoy it even if they have no interest in that topic normally. That’s the ‘bringing new ideas and new perspectives to an audience’ aspect of journalism.”
She sees the site as being a great place for journalists to gather, even though it isn’t a network for breaking news like Twitter. “Journalists might find Capioca useful for making contacts, creating a portfolio of work and interests, in-depth discussions and reaching new audiences with their stories,” she says. “We are also a platform for unique ideas; for example, an aeronautical engineer posts his inventions. We have journalists on Capioca using it to share ideas and interests they may not post about on other social sites, because they use Twitter mainly for work, Facebook for friends, and so on.”
Most of the activity on Capioca revolves around Collections, which as it says on the tin, are collections of web content like articles, videos and photos, based around whatever topic or theme you fancy. You can also repost items from other people’s Collections and add them to Collections of your own. It’s a format that’s familiar to anyone who uses Pinterest, but Rebecca insists that Pinterest and Capioca aren’t about to be competing any time soon.
“Pinterest is a great platform, but we’re very different in terms of content, feel and demographic,” she says. “For example, our readers and members are 50/50 male and female.” This is opposed to Pinterest’s vastly female-dominated user base. “We focus on the arts, science and society over lifestyle content; you’re more likely to find a topic on ‘Equality’ or ‘Journalism’ than ‘Style’.”
Capioca is also more of a text-driven site; members can start Discussions, which are like self-contained comment threads, and compose articles of their own. “Our members are a mix of media, science and creative professionals, as well as students. The site is used in a variety of ways, depending on your interest or aim.”
Simple and stylish
Capioca’s words-and-visuals mix comes in part from its two founders, who have different areas of interest when it comes to web content. “Byron Wong, my co-founder and partner, tends to favour videos and pictures, while I prefer text,” Rebecca says. “We mix all types of content in together, and you can choose what you want to see.”
They were united in the overall look of the site, though. Capioca was designed to be “simple and stylish” with a warm feel to it, which resulted in the site’s sunny yellow appearance.
“We are continuously tweaking Capioca – there’s so much more we would love to do,” Rebecca concludes. “Our members tell us it’s a good start though!”
What does she think of the current state of social networking as a whole? “Social networking continues to adapt and change, and it will be interesting to see what happens this year,” Rebecca says thoughtfully. “If it wasn’t for Facebook, Byron and I wouldn’t be working together now. We met at a dance group, but got chatting properly online – now we live and work together on projects 24/7.
“It expands opportunities and changes lives, but it can also be overwhelming, so I think you have to find and use the networks that work best for you at that point in time. Our members are looking for niche, meaningful content and spaces. They don’t want to come away feeling like they’ve wasted their time, but rather invested it.
“For us it’s about being authentic and listening to what our members want.”
For now, there’s no official launch date as Capioca tries out new things in closed beta and gathers feedback. However, anyone who wants to can request an invite at www.capioca.com, and you can also find Rebecca Findley on Twitter.
A lecturer at University College London (UCL), Dr Cheshire, 27, completed his PhD in Geographic Information Science. He mapped distributions of people’s surnames, and tried to uncover what your surname says about you based on its geographical origin. The project sparked his interest in large population datasets.
He is a keen advocate of open data, and sees London as setting the bar for other cities.
“The reason we called the book London: The Information Capital is because we think it does set a pretty high bar for other cities to open up its data.”
Available data not necessarily accessible
“The principle of open data and availability of data and all that kind of stuff is undoubtedly a good one.
“The downside is, or the challenge is, actually making something available doesn’t necessarily make it accessible.
“Making the datasets easy to find and making them easy to manipulate or to use is the next step, and one that comes along with the provision of data in the first place.”
But he doesn’t think much of what he calls “clickbait” data journalism, citing the Daily Mirror’s Ampp3d as an example.
“Clickbait” easily forgotten
“I really enjoy data stories put together and carefully thought out over several days worth of effort, rather than some of the clickbait stuff.
“The clickbait stuff is here today and then forgotten about, whereas the really good quality stuff tends to last a bit longer, people tend to think about it a bit more.”
What about the more established nationals?
“Some of the stuff that the Guardian Datablog does can be really good, some of it can be a bit dry or a bit dull or whatever.
“I think the guys that are the best are still the New York Times in terms of their graphics.”
“People don’t like uncertainty”
He says that the biggest challenge that he and his co-author Oliver Uberti have tried to communicate with the infographics in their book is that “people don’t like uncertainty.”
It is this uncertainty that can lead journalists to oversimplify and thus slightly misrepresent data stories.
“Traditionally in statistics you put error bars or something, or you say like plus/minus 5%.
“That plus/minus bit is often left off because people don’t necessarily understand it, or the journalists or whoever it is are keen to present a straightforward story.
“We often talk about that in academia – that’s one of our biggest challenges when we talk to journalists: How do we communicate that uncertainty without sounding like we’re wrong, but without sounding like there’s an absolute number for something.
“Because the world is not that simple, unfortunately.”
(Chart: “Islington has issues” from London: The Information Capital)
In her spare time, Ri recently created We Can Do Better, which is a visualisation of gender disparity in engineering teams in the tech industry. I was interested in how a reasonably simple data set could be made much more engaging through the visualisation.
What was the inspiration for We Can Do Better?
It’s an ongoing issue in the tech industry and as a female in the industry I just asked myself ‘what can I do?’. It’s frustrating when you see this inequality and imbalance.
This data has actually been around for a little while now but in the form of a spreadsheet. It’s great and a lot of people have added to it, but it’s quite technical and has to be updated by submitting a pull request on GitHub.
So I thought, since I have the design and coding background and I’m in tech, maybe I could bring it to a wider audience.
I want to let people touch this information and engage with it, instead of seeing rows and rows on a spreadsheet.
It’s definitely a lot easier on the eyes.
Yeah. I’m glad it’s been shared a lot, and maybe different people and journalists can now engage with this data more easily than before.
Which tools do you use and how long did you spend on it?
I spent a few weekends on it and the visualisation itself is built using D3.
This project is actually on GitHub, I’ve put a creative commons license on it so anyone can look at the code.
Was it worth putting the the time into?
Definitely. Personally, I just wanted to see this data visualised. I’d seen these numbers but it wasn’t really connecting with it in a meaningful way.
I didn’t expect for it to be tweeted around as much, but that’s been really awesome.
How easy would you say it is for someone to learn to use D3?
It’s definitely not the easiest tool to get started with, but once you do get a grasp of it it’s incredibly powerful. When you want to do something you’re not limited by the code at all, so you’re able to say ‘I want to explore the data this way’ and have the tools to do that.
I hardly ever geek-out over technology, but this is the one exception where I rave about it. Compare it to the other end of the spectrum, like the rudimentary graphs in Excel. They just leave you feeling trapped.
Have you noticed increasing interest in interactivity and visualisation from journalists?
We work a lot with publications and I think they’re realising that we need to present these figures visually and in a more compelling way for them to reach people.
That’s definitely been a shift and I think we’ll see more places engaging with data viz companies and studios, as well as more doing it in-house as well.
I’m also interested in how interactivity is being used to tell non-data stories, the most obvious example being Snowfall.
I’m a very avid web user but the problem is that I don’t read a lot of longform content because I just have so much to read that I don’t absorb a lot of it. A lot of sites are just competing for that attention and working out how to make this digestible for people.
I think it’s great to have more visual imagery and better design and it’s great that a piece like Snowfall got such wide attention. It’s like ‘oh, let’s actually pay attention to the design of these articles instead of just dumping text in front of people’.
I’d like to see what the reader stats were for it.
Because there’s a lot more time gone in to presenting the content like that, I’d also be interested in what that means for the timeliness of certain articles. That was a good piece because it wasn’t about something current, it was just a story.
But it’s a great way of presenting stories which isn’t just dumping traditional print content onto a screen.
Are the tools getting better for making interactive things more quickly? Could we see more timely articles being made interactive?
I wonder whether it’s even possible to produce a piece like that without putting the effort in and finding the best visuals and other content.
Obviously there are technical aspects like the parallax and scrolling effects they put in, which could just be bundled into tools. But I think that the real beauty of it is in the thoughtfulness, and I’m not sure you could match it without effort and time.
Should we expect more personal projects from you?
I’m always playing around with new technologies. I’ve been meaning to do something with semantic analysis and playing around with words to see biases and other insights.
I’m interested in making people aware of what they’re subconsciously doing and the assumptions they’re making. We’ve got a lot of traces of that on the internet these days, on Twitter, blogs and all these social networks, so it would be cool to do something with it.
That’s just in the back of my mind though. I’m playing around with it but nothing concrete so far.
A veritable giant of data journalism, Simon Rogers launched the Guardian’s Datablog in 2011 before moving over to Twitter where he now manages the site’s vast quantities of data. We asked him about the perils of data journalism’s popularity and where it’s all headed.
Twitter has an unbelievable amount of data – what do you with it all?
It’s a lot of data — around 500 m Tweets a day. What we try to do is tell stories with it, much of which entails making it smaller and more manageable, to filter out the noise that we don’t need. People Tweet how they think and how they behave — the data can show you amazing patterns in the way we respond as humans to events as they happen. When a story breaks somewhere, or a goal is scored or a song is performed, you can discern these ripples across Twitter. It’s getting those ripples out of the data that is the challenge.
What’s the day-to-day like as data editor at Twitter?
It is such a mix and each day brings its own surprises and challenges. At one end of the spectrum I use free tools such as Datawrapper or CartoDB to make maps and charts that respond to breaking news stories or events, such as this one on the spread of Beyonce’s new album or the discussion around events in the Ukraine or the conversation around #Sochi2014. At the other end of the spectrum, I get to work with the data scientists on Twitter’s visual insights team to produce things like this interactive guide to the State of the Union speech or this photogrid of the Oscars, which is essentially a treemap with pictures. Right now we’re thinking ahead to things like the World Cup and the US Midterm Elections to answer the question: how can we use Twitter data to help tell the stories that matter?
Are you still a journalist?
I’ve wanted to be a journalist since the age of eight and it’s completely in my DNA. Over that time the idea of what was or wasn’t a journalist has completely changed. When I started the Datablog at the Guardian, people asked if data journalism was really journalism at all to which my response was: who cares? My feeling is that you just get on with it and let someone else worry about the definitions. My job is to tell stories and make information more accessible to people. I take Adrian Holovaty’s approach to this:
1. Who cares?
2. I hope my competitors waste their time arguing about this as long as possible.
What do you think about the Guardian’s Datablog since you left?
The Datablog was my baby and always will be special to me but I have to let it go and not interfere, so that’s what I’m going to do.
What drove you to found Datablog?
We had a lot of data that’s we’d collected to help the graphics team and we also saw there was a growing group of open data enthusiasts out there who were hungry for the raw information. So that’s how it started: as a way to get the data out there to the world and make is accessible.
Have you found there any difference in the attitudes towards or ideas about data journalism in the US and UK?
The differences in data journalism mirror the differences in reporting I would say. It’s a huge generalisation but I would say US data journalism tends to be about long investigations while a lot of the British reporting is aimed at shorter pieces answering questions. But there are exceptions on both sides. They come from different places: US data journalism is based in the investigative reporting of giants such as Philip Meyer; modern British data journalism was born out the of the open data movement and had at least as much to owe to a desire to free up public information as to big investigations.
Is data journalism ‘having a moment’ or are we in the midst of a very real paradigm shift?
It’s becoming mainstream and, just as in other areas of reporting, it is developing different strands and approaches. Partly because there are just so many stories in data now — and to get those stories journalists need skills and approaches they didn’t use before.
I think we are really at an interesting stage. The last few months have seen a lot of reporting resources put into data journalism, certainly in the US. I think what’s happening is that it is developing different strains — in the same way as you have features and news reporting in traditional journalism. You have the ‘curious questions’ type of data journalism which focuses on asking about oddities; then there is the open data type of data journalism which is all about freeing up information. I’m not convinced that we have as a group got the balance correct between showing off how clever we are and making the data accessible and open. That last part is what I’m interested in. I don’t need to see anyone showing off.
Journalists are no longer just writers, they are designers. How important are pictures, diagrams and infographics?
I speak as someone who has just worked on this range of infographic books for children. We have visual minds and telling a story effectively with images will always have a greater impact than words on a page. Some of the most detailed journalistic work I have ever done has resulted in images and graphics as opposed to long articles.
Have you seen any recent data journalism that has particularly caught your eye? And what is it that you look for in a good article/webpage?
I love maps but there are just so many of them these days. Is data journalism becoming over-saturated?
There are a lot of maps around but it’s just one visual tool. Maybe we don’t ask enough questions about which type of visualisation is most powerful and important to complement a story or feature and a map is often easiest. But also that reflects the lack of decent tools for us to use. If I want to visualise a Twitter conversation off the shelf, that often means a map or a line chart because that is what I can do easily and quickly on my own. Part of my job is to think about new ways for us to do this in future.
Do you think data journalism runs the risk of looking at the big picture at the expense of the small one?
Not being able to see the wood for the trees? The best data journalism complements the big data picture with the individual stories and story telling that brings those numbers to life. I’ve been fortunate enough to work with amazing reporters who tell very human tales and the numbers just gain so much power from joining those two elements together.
Do you have any favourite data tools – scraping, cleaning, visualising?
My visual tools of the moment are: CartoDB, Datawrapper, Illustrator and newly I love Raw (just discovered it).
Do you have any core principles when deciding how to express data?
I normally start off with some idea of what I’m trying to ask, otherwise the data is just too big to be manageable. Love that moment when you do the grunt work to clean up the data and it starts to tell you something meaningful.
Do you have any tips for aspiring data journalists?
The days when you could get a job in a newsroom just by knowing excel have probably gone or are going. Increasingly the data journalists who succeed will also be able to tell a story. The other piece of advice? Find something that needs doing in the newsroom — that no-one else wants to do — and be the very best in the world at doing it.
Carl Bialik is a writer for Nate Silver‘s new website FiveThirtyEight, having recently moved from the Wall Street Journal where he started The Numbers Guy column. I ask him about the ups, downs and difficulties of being a data journalist, as well as what he thinks are the most important traits for being successful in the field.
You recently moved to FiveThirtyEight from the WSJ: do you think the two publications differ in their approach to data analysis?
With The Numbers Guy at the WSJ, my role was more about looking at other people’s data analyses, taking them apart and finding the weaknesses in them. I’m going to be doing some of that at FiveThirtyEight but will be more focussed on doing original data analysis.
When you first started at WSJ, were you a data journalist? Or was this more of an organic development?
When I started at the WSJ I don’t think I had even heard the term “data journalism”, and I wasn’t a data journalist for most of my first years there. The more specialised role came later when I started writing The Numbers Guy column. Then, when the WSJ expanded its sport coverage, I started to write much more about sports from a data point of view.
Which is your favourite sport to write about?
My favourite sport to follow is tennis, which is in some ways both my favourite and least favourite sport to write about. It’s my favourite because it’s largely untapped territory in terms of data analysis, but it’s also one of my least favourites because of the way that the data has been archived, making it one of the most difficult to get accurate data for. It’s a pretty fertile area, though, and although it’s not big in the USA, there’s always going to be a focus around major events.
What steps do you take to make sure that the data you are analysing is accurate?
There are some built-in error checks with analysis, which can help determine the reliability of the data. These include checking whether the data you are running the analysis on makes sense, and looking whether different analyses produce similar results. Another important question to ask yourself is whether there is some important factor that you are not controlling for.
At FiveThirtyEight we also have a quantitative editor who reviews your work and points things out for you, such as confounding variables and sources of error. Readers are really vital for this, too: the feedback we have already received from readers who tell us when they think we have made mistakes has been extremely useful.
What do you think are the most important traits for being a good data journalist?
The first is having a good statistical foundation, which includes being comfortable with coding and using various types of software. The others are the same as for all types of journalist: being a collaborator, fair, open-minded, ethical, and responsive to both readers and sources.
Which data journalists do you particularly admire?
I’ve admired the work of many data journalists, including my current colleagues, and my former colleagues at the Wall Street Journal. Certainly Nate Silver at FiveThirtyEight: he is a large part of the reason that I wanted to work with FiveThirtyEight in the first place. Also my colleague Mona Chalabi because she has a great eye for finding stories with interesting data behind them.
What’s the best part of being a data journalist?
Compared to most journalism, I think there is more potential to have an “aha” [eureka] moment for any given story, since it can sometimes be a slog if you’re trying to get that just from interviews or other sources. Any data set has the potential to give you a couple of these moments if you’re spending just a few hours looking at it.
And the most difficult part?
I think number one is when you can’t get hold of the data for something: occasionally a topic can be very hard to measure, and you would love to write about it but just don’t have a way in. This is often the case with sport in particular, where there can be measurement problems, issues with the quality of the data, or even a complete scarcity of it. So issues with data quality and access are the most difficult parts.
Kiln is a design studio specialising in data visualisation, digital storytelling, maps and animation. It was founded and is run by Duncan Clark and Robin Houston, creators behind such projects as Women’s Rights or In flight for the Guardian. In this short interview Duncan Clark tells Aleksandra Wisniewska how they go about their projects.
How do you choose what subjects to cover in your visualisations?
It’s a mix. Sometimes we have an idea that we know we want to pursue; sometimes the Guardian or another client will approach us with an idea.
What is key for you in the process of designing information?
One golden rule is to let the information speak for itself. There’s no point making a pretty visualisation if it doesn’t make the data clearer to understand and easier to interrogate.
What is your favourite project that kiln.it worked on so far and why? What do you think makes it interesting for people to explore?
“In flight” is certainly the most ambitious thing we’ve done so far and possibly my favourite. I like that fact that almost everyone says “wow” at seeing the sheer number of planes that have flown through the air in the last 24 hours. But I also think it’s interesting as an experiment in combining different approaches to storytelling: it takes elements from documentary making, data visualisation, radio production, live mapping and tries to combine them into a coherent whole.
What’s your work process? How much leeway do you have in your work? Do you get precise instructions for your projects or do you only accept broadly defined commissions?
It varies. Sometimes the starting point of a commission is just a broad subject area; at other times a client might have a very specific visualisation technique in mind from the outset. Most commonly, though, we’re given a dataset and asked to work out how best to turn it into something compelling.
What advice would you give to a budding data journalist?
Do you need to be a data scientist to work in data journalism? What is the difference between data analysis and data science? Beatrice Schofield, Head of Data Intelligence at import.io talks to Interhacktives’ Aleksandra Wisniewska and debunks the data science myth.
What does your job as Head of Data Intelligence at import.io consist in?
On a day to day basis I think about new things we can do with data and how to engage new areas whereby people who are not technically trained can start using open data for their fields of research. I also work on news cases by approaching NGOs and data journalists with ideas for stories with data sets. A lot of it is content-driven. It is exploring open data, how to better use it, extract it from sites and build data sets – much of it has traditionally been the realm of people who can programme. I make sure we get the data and give it to people who would be interested to use it but have previously been unable to because they are lack this skill and are not data scientists.
Do you approach journalists or media organisations?
It depends. If there is something big coming up like the budget, I quickly build an extractor which for instance allows us to get the data off the BBC on a minute-by-minute update, which within an hour we give to the Guardian. They then inform their sentiment analysis whereby they could read what was happening. We often take a pro-active approach. We are responsive: when Nelson Mandela died and the Guardian wanted data quickly, we could respond by predetermining what data might be interesting at that time and providing it to journalists.
Who has import.io cooperated with thus far?
We provided data for the Financial Times, the Guardian, the New York Times. The big data story that has recently made the news is Oxfam’s analysis which shows that five richest families in the UK have more money than the poorest 20%. We worked with Oxfam to get this data before it became a media sensation. We’re after pre-determining things like this as well.
Do you hack to get data?
I am not technically trained so I do all my scraping via our tool. On an analytical level, I rely solely on this to get large amount of data and give it to people in whatever form, so there is no need for me to concentrate any attention on developing skills which aren’t necessary with the tool we’ve got.
What kind of skill set do you use in your work?
I have been doing data analysis since university in different roles. But Excel is where it begins and ends. A lot of it is qualitative and quantitative research because much of my work is content-driven. And on a day to day basis I am very much operating as any other data analysts without the need to delve into the realms of data science. It’s pretty much beyond me.
What would you recommend that a trainee data journalist learn in terms of software and skill?
From my perspective it is important to have written something before and being on the sharp edge of data analysis. Data journalism is now a fundamental part of journalism and you can’t be a journalist without being data-savvy. In terms of developing the right skill set, I don’t think it is necessary to be a good programmer. I think you can focus on other areas. Tools are now here, like import.io. to access the data, Tableau to visualise it and all that is left is analysis and seeing where the stories are. This is what data journalism is about. Being quite academic, realising where the holes in the data are, seeing how the bias is created by certain data sets. Because there is a tendency for people to see data as fact and not as a socially constructed set of numbers or letters. It is important to be very critical with what we are being presented with and looking at what is missing as opposed to just what is there.
I certainly think that with data journalism moving forward, you have to have the ability to engage wholly with the amount of data that there is on the web, and have the ability to look into it and see what you can do. Because at the moment we are still – for various reasons – only looking at a tiny section of what’s available. It is key to think imaginatively and creatively about how we can build data sets over time and to focus your skills qualitatively and quantitatively as opposed to focusing all our attention on being a good programmer when it’s no longer the time to be it. There are now tools that allow you to have data sets and spend time focusing on stories.
Is statistical knowledge key, then?
Mostly for journalist’s own time management. No one wants to spend a lot of time in untidy spreadsheets, cleaning data sets and thinking: “This is a bore”. To be able to do the analysis, you can spot trends and patterns and have insights early on but in terms of advanced statistical knowledge, I don’t think it’s necessary. I don’t have it myself. Data science is pretty much a fashion statement now.
You mentioned before a line that should be drawn between a data scientist and a data analyst. Where does it lie?
Where I believe the split lies in the technical skill set. Data scientists traditionally write a lot of script, are able to do mining on huge data sets using scripts. While I see a data analyst as being able to perform the same analysis as a data scientist without having the programming skills and science degrees under their belts. But the two come from the same realm.
Do you think newsrooms will start employing data scientists?
I don’t think they can afford them. A data analyst could easily perform the same job by using freely available tools as opposed to using their own technical know-how. In terms of mining large data sets, it can be a collaborative work of scientists and analysts, but not in terms of assistance to data journalism, which is spotting what you what to see in the stories as opposed to delivering a very methodical, technical approach. I think we are now developing tools that might almost push data scientist to the side.
What would be a prerequisite for becoming a data analyst?
You need to be quantitatively trained in some sense. It doesn’t need to be a degree. For instance, social sciences usually require a quantitive approach. Personally, I have learnt a lot about data analysis while being on the job. You can’t really set aside a certain skill set. Obviously there are certain skills like Excel that are needed to advance but beyond that, analysis can be done at a very qualitative level as well. And then you back it up with figures.
You have told me about your 6-month long project of monitoring alcohol prices on Tesco website. What happens when such a time-consuming undertaking does not yield results you expected?
That’s the nature of it. What you presume might happen might not always happen and your assumptions might be wrong. But with tools like import.io you can run a couple of projects at the same time, so it’s not as if you’re banking on one data set to provide you with the story that you want.
How do you go about generating an initial idea for a project?
I approach my work with an inquisitive approach. I wonder “what could you find out from that?”. Sometimes I don’t start with a pre-determined outcome, but just with creating databases over time and at some point a story is bound to come out of one of them. It’s just all about being imaginative.
And I am having a lot of fun with it. I know data analysis is considered a bit of a dull area but then if you draw the content out of it, you can make it fun. We have been looking at Dulux colours and names of paints because they are absolutely ridiculous and we made a game that pulls the names apart, for example “pomegranate champagne”. Previously we made a game which made people guess which newspaper said which headline. You just need to be creative with it.
What is good data journalism for you?
I think the Guardian did well. They were the first ones to really push it to the front and say they are very much a data-driven newspaper. But it can be anyone who has the ability to see something unique in data, to bring different insight, different experience and apply it in the data set. This is what I believe sets people apart: the ability to communicate well through visualisation and good analysis and seeing possibilities in data.
Data journalism sits on the split between sciences and humanities – it relies on both to be able to be performed well. It does not require heaviness in the scientific field. It requires intuitive questioning and thinking about external factors that come from humanities.