Interview with Eliot Higgins of Bellingcat

Verifying missile launchers, tracking down ISIS supporters and holding worldwide governments to account is just a day’s work for 36-year-old Eliot Higgins.

Last time I met Higgins, an independent intelligence analyst, he was giving a talk about his work with Bellingcat, the investigative news network he founded in 2012. It was this network that trawled the internet’s vast and polluted reservoir of publicly accessible material to track down the Russian-owned missile-launcher that took down the Malaysia Airlines Flight 17 over Ukraine in 2014.

This time, it’s me doing the tracking, struggling to find Higgins on the hectic roundabout at Old Street station. I eventually spot him standing next to a telephone booth squinting through his glasses, in a Matrix-style black coat.

We only have fifteen minutes but there’s an excitement in Higgins’ eyes as he talks about his work, while easing into his chair. Ordering a Coke, he laughs at how he’s been trying to avoid caffeine.

Bellingcat formed after Higgins’ personal blog, Brown Moses, attracted huge attention as he was able to uncover war atrocities, such as the use of cluster bombs, from the comfort of his home in Leicester. It was the readily available nature of open source tools that prompted Higgins to start Bellingcat and form a network where others were able to learn how to use such tools in their own investigations.

I ask him what his proudest moment is with Bellingcat since his formation. He screws up his face a little, “Hiring people is a lot of fun”, he says. “If it wasn’t for what we did, we would have had this whole narrative of [the] Russian government [claiming to intervene in Syria only to fight ISIS, and not prop up President Bashar al-Assad’s regime] that wouldn’t have been challenged,” he explained.. “And, you know, there are families involved who are being lied to by the Russian government and without us, there would have been no push back.”

“For me, a lot of what we do is about accountability and justice and working with international organisations on that.”

Though the website states it is by and for “citizen investigative journalists”, and many news outlets, including the Financial Times, call its founder a “citizen journalist”, Higgins himself is uneasy about the label.

Shuffling in his seat, he explains: “It’s not citizen journalism. It’s not just about conflict or journalism. It’s about all kinds of different areas. From my perspective, the work we do is not about journalism: it’s about the research and getting it [the findings and tools] to people that can actually do something with it.”

“For me, a lot of what we do is about accountability and justice and working with international organisations on that.”

While Higgins wants to distance Bellingcat from being purely journalistic, the network’s handful of contributors definitely shares a hack’s mind-set, utilising publicly available tools, such as Google Earth and social media, to investigate atrocities abroad.

Credit: SKUP 2015, Marius Nyheim Kristoffersen

Three years after the network shot to fame by solving the MH17 mystery, it now covers all corners of the Earth and is fast becoming a force to be reckoned with. This was made clear last November, when Higgins quashed the Russian government’s denials over the bombing of a hospital in Syria. By comparing satellite and on-ground photographs from 2014 to 2016, he was able to show specific areas that were in fact damaged by bombing.

Bellingcat also drew huge media attention after using social media to track down ISIS supporters. Most recently investigators used archived Facebook profile and geo-located social media photos to hunt the Berlin Christmas market attack suspect.

“We thought it would be impossible. Within an hour we had the exact location”

When I ask him about how Bellingcat uses social media in their investigations, he blushes, admitting that they recently caused a “minor panic” in Holland, after the network asked its Twitter followers to geo-locate a photograph found on an online community consisting of ISIS supporters. He laughs, shaking his head as he notices my eyes widen: “It’s nothing urgent or scary. We had one photograph [and] we just wanted to know where it was because it looked like it was in Europe. So we put it out on Twitter, asking if people could help geo-locate it.

“We thought it would be impossible. Within an hour we had the exact location: in a holiday park in Holland. The police showed up at the holiday park and the poor manager had to come out in the middle of the night.”

This brings our conversation to online privacy, as I noted that day he asked his 49k strong Twitter followers about Donald Trump. He says, with a cheeky glint in his eye: “My Twitter page looks like I do a lot online. But if I’m away, I won’t share when I’m actually away. If I post a picture of my time abroad it’s often a week after I’ve actually been there.”

He adds, laughing: It amazes me that people keep their Instagram profiles public. Who needs likes that much?”

I keep my own settings to myself as he stands up to leave, shaking my hand and plonking the Coke can on the table. At that point, I sadly decide it’s time to change my Instagram settings to private.

The Bureau of Investigative Journalism’s Megan Lucero: it’s time for data journalism to go local

Megan Lucero has seen it all. The former head of data at The Times and The Sunday Times is now directing her attention to local data journalism as the head of the Local Data Lab at the Bureau of Investigative Journalism. In a (second) discussion with the Interhacktives, Megan talks about her decision to leave The Times, how she envisions the Lab’s future, the importance of collaborative data journalism at a local level.

“I left purely from the idea of it [the Local Data Lab] is really what journalism needs and what data journalism should be contributing to,” she says.

“I left the Times really happy with how we got on there,” she continues. “We brought data investigations into the heart of the newsroom. We went from being a sort of Excel help-desk to actually being integrated into news investigations, big-time front pages […] I was very happy to leave knowing that I was leaving a really strong legacy. But I left because I believe that this is really, really important”.

Megan is referring to the Local Data Lab, an arm of the Bureau of Investigative Journalism, who have set out to “fill the voids” after many newsrooms cut provisions for investigative journalism in the wake of the 2008 economic crisis. “Investigative journalism is expensive: it takes time it takes a lot of people and a lot of [economical] resources,” she explains.

Local newspapers bore the brunt of these cuts, so the “idea was to try to help to solve that,” Megan continues. The Lab will focus on data journalism, which she believes is a form of investigative journalism at its core. For her, data journalism hinges on “the idea that you harness data to find stories, and using data to find stories is in itself journalism.”

Data journalism is an exercise in finding stories in the large amount of data that the digitised world offers today:the journalist has to be able to swim in a sea of information,” Megan says. “In order to do that, there is often technical innovation that is needed, and that’s where I see the [data journalism] coming in. The computational method — the means in which to programmatically query and build databases, automate the process — all of that comes together in digital journalism.”

One of the Lab’s more challenging tasks, Megan says, will be to bring data journalism to the local level: “It’s an ambitious challenge —it’s a really daunting one — but I absolutely [think] that’s right where my next step had to be.”

“There is a time and moment to listen to our communities and to find out what it is that each member of those communities is trying to say,” she says.

Megan maintains that the introduction of computer-assisted journalism to regional newsrooms will not affect regional reporters: “We want to make sure that we are not trying to put local journalists out. A lot of local journalists use the Office of National Statistics or data.gov.uk We are not going to try to change that because that would potentially harm their jobs,” she says. “We are after the gaps in the industry, right? Local reporters don’t have the means, the time or the resources to do the computational work […] My team essentially would be coming in to try and provide this”.

Megan insists that the new Lab will focus on unearthing local stories and issues with the help of regional journalists: “There is a time and moment to listen to our communities and to find out what it is that each member of those communities is trying to say,” she says.

How will she carry out this goal? “I am going to listen more than I am going to talk,” she explains. “What are the stories that need to be told at a local level? What are the stories they want to tell? What are the datasets that are not open? What are the challenges to covering local beats?”

She explains that the Lab’s role will empower local newsrooms, and stresses the need for transparency and accountability at a local government level. A failure of local governments to provide information and data surrounding its work is “a problem for democracy and it’s a problem for the free press unless we address it”.

She then discusses her vision for the Lab: “I don’t want to be too prescriptive at this stage, but my goal is to find datasets that are not in the public domain. Already we have a few stories that we want to be bringing out: things that NGOs, charity groups [and]activists have obtained via Freedom of Information requests, a lot of datasets that haven’t actually been brought out nationally or locally. We are going to merge lots of datasets to find analyses that maybe we hadn’t [found before].”

But searching for datasets is not her sole objective, she says. The Lab also hopes to change the way local stories are told and highlighted in the media: instead of having national newspapers dictate the media agenda of the day from the top down, local news outlets will shine: “Our idea is to put the power directly into the local stories, so every dataset that we worked on has to scale on a national [newspaper]”.

The Lab will work together with local newsrooms by collaborating on local issues identified by the partner and discovering whether it might be a story suitable for a national audience. They will do that by using resources not typically available for the local newsroom (it could be complicated datasets that can’t be accessed from a local newspaper, or advanced analysis techniques): “So the idea is that we would scale it [for national coverage]”.

The Lab and the local newspaper will then break the story together, one (the Bureau) as a national news outlet and the other locally. Cooperation is a fundamental journalistic value for Megan: there is little space for competition as “it’s very unlikely that anyone will ever scoop you or steal your story out of the back of it,” she says.

Megan says the creation of the Lab was inspired also by other major nonprofit newsrooms, such as ProPublica. ”Their Electionland project, their data lab, their data stories — they are everything that we were looking to do”, she explains. The Bureau of Investigative Journalism frequently communicated with Scott Klein, ProPublica’s editor, while it was developing the Lab.

Finally, the Lab comes together in one role: “the idea of independent journalism breaking down important complex datasets to very small levels, that’s where we are hoping to do”.

Does the idea of working for the Data Lab interest you? The Bureau will be accepting candidacies until the 1st of February, you can find applications details on Megan’s Medium.

Correction: February 2, 2017
An earlier version of this article misspelled the name of the Lab. It is the Local Data Lab, not the Local News Lab. Some other amendments were done to the quotes in accordance with Megan Lucero.

Data day: The rise of fake news on Facebook

Did Pope Francis endorse Donald Trump? Did Hillary Clinton sell weapons to Isis? If you don’t know the answers to these questions, you may have been the victim of fake news. In the first episode of a new podcast from Interhacktives – Data Day – Ella Wilks-Harper and Luke Barratt discuss the rise of fake news, question whether the crisis has been overstated, and examine some possible solutions to the problem.

Fake news on Facebook has been the subject of a frenzied debate recently, especially around a US election that has seen a country divided bitterly. As Americans – and Brits – retreat into online echo chambers of their own making, filling their Facebook feeds with people who agree with them, is it any wonder that ideology might start to trump fact? Some consider fake news the logical conclusion of the filter bubble. Will it be a wake-up call for Facebook to recognise editorial responsibility and abandon the utopian dream of its impersonal, all-ruling algorithm?

Mark Zuckerburg’s initial response to the fake news scandal:

bit.ly/2fZ533d

Buzzfeed’s story about Macedonian teenagers using fake news to garner ad revenue:

bzfd.it/2fYYxcZ

A letter from the editor of Aftenposten attacking Zuckerburg over the censoring of a picture from the Vietnam War:

bit.ly/2fZ4QNJ

Buzzfeed’s analysis of engagement with fake news on Facebook in the last few months before the US election:

bzfd.it/2fZ5JWt

6 tips from Google on making compelling visualisations

Journalists aren’t used to taking advice from tech companies. Indeed, the row between Facebook and the journalism industry has intensified over the last two months. A Vox article published earlier this month attacked Mark Zuckerberg, who has been called “the world’s most powerful editor” for abandoning his editorial responsibilities.

Recently, Facebook fired the human editors on its ‘Trending’ team, prompting a flurry of clearly fake news stories from its automated algorithm.

Meanwhile, Google seems to have been moving in the opposite direction. Its News Lab has now been around for over a year, and has the explicit intention of collaborating with and empowering journalists. The Lab, which is run by former Boston Globe reporter Steve Grove, frequently works alongside journalists.

News Lab hosts a monthly Data Visualisation Round-Up in the form of a live YouTube discussion between Simon Rogers, the Google Trends Data Editor and former Guardian journalist, and Alberto Cairo, the Knight Chair in Visual Journalism at the School of Communication of the University of Miami.

From their 31 October discussion, here are some key points:

1. Graphics need a human side

Data journalism sometimes gets a reputation for being cold and calculating, as a place where statistics matter more than humanity. But data journalists are more than just automated counting machines, who often bring their emotions and convictions to bear on their work, and it is vital for data journalism to reflect that.

In the video, Simon Rogers recommends the September 2016 book Dear Data, by Giorgia Lupi and Stefanie Posavec. These two information designers, physically separated by the Atlantic, spent a year befriending each other by sending weekly hand-drawn data visualizations on postcards back and forth.

The cards contain many examples of innovative ways of displaying data, but the project was about more than that. Rogers calls it “a reminder that graphics should feel human and warm”.

2. Imprecision is fine

The era of Big DataTM has encouraged the growth of imprecise data analysis. In days gone by, sampling was the only game in town, and it was necessary for data to be incredibly precise, since datasets were relatively small. Now that data analysis and data journalism is starting to use big data, the sheer sizes of today’s datasets eliminate any problems that might arise from occasionally imprecise points.

brexit-and-us-election
This image, for example, illustrates the relative interest around the world in Hillary Clinton and Brexit.

Google News Lab teamed up with Accurat, a data research firm, to create World Potus, a project that uses Google Trends to look at how people in countries around the world were discussing the US election, by analysing their Google Searches.

Naturally, when using data from every single Google Search, some data points will be unhelpful. Someone might misspell ‘Clinton’ in an unpredictable way, or search while on holiday, making their geographical data misleading.

But since Google Trends uses big data, this doesn’t matter. There are so many points in this dataset that imprecision pales into irrelevance.

3. Data journalism should be collaborative

While more traditional journalists jealously guard their scoops, and are full of stories about the ruthless methods they’ve had to employ to get to the scene of a story first, data journalists can often be seen asking for (and receiving) help on Twitter from their colleagues. What’s more, articles often come complete with a link to the original data, so that other data journalists can dig for their own stories.

This is why all the code used in projects like World Potus is available on the Google Trends Github page.

4. We have to think more about our audience

Data visualisation is no longer the insurgent force it once was in the journalism industry. These days, infographics are pretty much par for the course, so much so that Giorgia Lupi has described our current period as “post-peak infographic”.

Sure enough, the New York Times has announced that it will now be producing fewer huge visuals. Does this mean that we’ve got over our initial enthusiasm for data visualization?

Rogers has a more nuanced view: “People are fussier about what they’ll love.” In other words, because of the recent glut of infographics, there is more importance on ensuring that the visualization serves the story and serves the audience.

5. Print can be more powerful than online

It is often assumed that data visualization is native to the Internet. While it is true that the online medium brings with it huge potential for interactive features, print can still play a vital role in visualization.

Alberto Cairo explains that he still buys print newspapers, and enthuses about the New York Times’ double page spread listing people who have been insulted by Donald Trump. The online version  is impressive, and gives the reader the ability to click through to specific insults, but the size and physical presence of a double page spread in the New York Times really brings home the extent of Trump’s vituperative qualities.

Cairo also cites the National Geographic magazine as a perfect example, specifically highlighting sketches by the artist Fernando Baptista, made for a large pictorial illustrated infographic about the Sagrada Familia cathedral in Barcelona.

“It’s gonna be like people listening to music on vinyl.” This remark from Simon Rogers perhaps betrays nostalgia stemming from his journalistic background, but probably chimes with the views of many modern journalists.

6. Data journalists must think about posterity

Excitingly, Rogers and Cairo seem to be planning some kind of grand archive for data journalism. One pitfall for visualization is the expiration of online programmes. For this reason, when Google starts a new initiative, it always has a plan for making sure that projects made using that programme will survive even if Google discontinues it.

As with much online journalism, data visualizations can be ephemeral, fading away after their first publication. Data journalists need to think about preserving their work, much of which will remain relevant for long periods of time.

Watch the full discussion here:

Data Journalism Awards past winner focus: The Migrant Files

migrant files nicolas kayser bril

The Data Journalism Awards organised by the Global Editor’s Network (GEN) showcase some of the best data journalism every year. Here we take a look at past winners in anticipation for this year’s awards.

In August 2013, Nicolas Kayser Bril, a French data journalist and CEO of Journalism ++, started The Migrant Files project along with 15 other European journalists in order to document the rising migrant death toll at the gates of Europe. The project was a response to the lack of official monitoring of migrant deaths on their journey west to safety.

“We started building our database based on information from NGOs that had done a terrific amount of work on the topic already,” said Kayser-Bril.

So the team extracted and aggregated data from open sources to build the database that would allow them to track each of the migrants dying everyday around Europe and the coast of Africa.

The data is visualised on a bubble map that indicates the number of dead migrants in Europe and Africa. The user gets information on the number of refugees and migrants that died between 2000 and 2015 by clicking on a specific spot in the map.

A detailed explanation of the project can be found on the same website under the article “counting the dead.” The team still updates the information and has since written another article on the amount of money the European Union spends to keep migrants out.

Kayser-Bril said that the map was still being updated to this day and that he and his team will not stop until international organisations like the UNHCR start doing the work themselves.

The jury described the project as an “excellent example of journalists intervening to put a largely neglected issue on the political agenda […] this is data journalism at its best. We need more projects like these.”

Kayser-Bril said it was a nice feeling to have the project recognised by peers.

And as for the data journalist awards? “They’re a great opportunity to review what has been done in a given year.”

Currently, Kayser-Bril is working on several cross-border investigations where “we follow the same goal of measuring the unmeasured.” One of them is The Football Tax, which measures the flows of public money spent on professional football. The other project is Rentwatch, which measures the prices of rent everywhere in Europe.

If you are a data journalist who wants to submit a project, the submission deadline is 10 April 2016. This year’s ceremony will take place at Vienna City Hall on 16 June.

Interhacktives is proud media partner of the DJAs.

 

 

 

 

Why you should apply to the Data Journalism Awards 2016

Data Journalism Awards

Are you a journalist with a data-driven project you are proud of?

Apply to the Data Journalism Awards (DJAs) 2016 now

The Data Journalism Awards are the biggest competition to reward the world’s best data journalism. Submission is open until 10 April 2016.

Interhacktives is a proud media partner of the DJAs. We’ll be publishing interviews, advice and showcasing past winners until the 12 prizes – worth €1000 each – are announced on 16 June.

Organised by the Global Editors’ Network (GEN), the awards are sponsored by The Knight Foundation and Google News Lab. Google’s data editor Simon Rogers is directing them.

ProPublica’s executive chairman Paul E. Steiger is president of the illustrious jury of 18, which includes The Guardian’s digital editor Aron Pilhofer; Cronkite School of Journalism Professor Steve Doig; The Economist’s data editor Kenneth Cukier and Condé Nast International’s Chief Digital Officer Wolfgang Blau, among others.

The panel’s goal this year is “to expand the number of entries”, especially from Asia, Africa, Latin America and Eastern Europe, says Steiger.

“We saw lots of great work last year –  the best ever – but we know there was excellent data journalism that wasn’t entered.”

The entries that stand out for Rogers generally work on all formats, from mobile to desktop; are transparent with their data; and blend data with new technology in an innovative way.

“There’s so much technological innovation going on right now, wouldn’t it be great if we could see more data journalism using these methods?”

Data journalists from organisations of all sizes should apply, he says. Start-ups and small newsrooms are sometimes “best placed to try out new things”.

“It’s a great surprise to see websites not known for data journalism really producing great work.”

See GEN’s interview with Simon Rogers and Paul Steiger for more of their advice on entering.

Shortlisted applicants will receive a free invite to the two-day GEN Summit in Vienna in June, during which the prizes will be awarded in a ceremony at Vienna City Hall on 16 June.

Apply now!

Megan Lucero interview: ‘all journalism will eventually be data journalism’

Hailing from a tiny Californian town, where the main mode of transport takes the literal measurement of horse power, Megan Lucero is quite the outlier. The energetic 27-year-old  who was remarkably promoted from intern to data editor at The Times and The Sunday Times in just four years  would certainly stand out if you found her in a spreadsheet. At their shimmering Thames-side offices, Lucero talked to Peter Yeung about the importance of open data, the inherent plurality in data teams, and how her paper was the only one to correctly reject the polling data about the UK’s 2015 General Election.

Can you talk about your rise through the ranks at The Times?

I was interning for a week on the foreign desk, and I was just finishing up my MA in International Journalism at City University. It was my first time in a massive newsroom, which is funny to look back on now. Towards the end of that, I started taking a lot more on for the desk, and suggesting a lot more we could be doing digitally. I was very fortunate that at the time Richard Beeston  who unfortunately passed away a couple of years ago  was very on board with this and gave me a lot of free reign to do that. But at the end of that, they were cutting researchers and my job came up for the axe. I went up to the editor and deputy editor at the time, James Harding and Keith Blackmore, and I pitched a job to them, which we later called a “story producer”. I invented the job and they said: “Let’s see what you can do”. It was a one-woman show for a year. I taught myself a bit of coding, built some interactives, I started our first Soundcloud account doing podcasts, I was running live-blogs and finding stories online. But another review came up. They asked me to apply as a data journalist, and at the time I didn’t feel qualified by any means. But once I took on the role, I wanted to own what data journalism meant, so I just started teaching myself. I was trying to learn as fast as I possibly could because I saw this as a massive opportunity  the future of journalism. After a while, I basically running the team, so I became data editor.

What role will data journalism play in future newsrooms?

One day data journalism won’t be data journalism  it’s just going to be journalism. This sexy term that everybody throws around will disappear. Every single journalist should and will be digging into all of the digitisation and data around their beat, finding their own exclusives. I think there will always be a specialised team that will need to help with really advanced machine learning, perhaps algorithms that look at modelling, but I really think that data journalism as we know it now won’t exist.

Do you ever need to convince others about the value of data journalism?

I’ve worked pretty hard to make sure that other journalists understand the value of it. We had a front page exclusive out about charities’ expenses recently, and every journalist knew exactly that the story was a combination of a data journalism approach and an investigative approach. Everyone recognises the value of data, but it’s a matter of whether they’d be equipped to do it themselves. Sure, there is a gap with what other journalists can currently do, but they still recognise that data is important and valuable.

Does the paywall affect The Times‘ approach to data journalism?

If there was a paywall, or there wasn’t, we would be approaching it as we do. If anything, there’s much more of an argument for what we do at The Times, because our business model is that we produce news worth paying for. You’re trying to give your audience and your reader something exclusive, something they can’t get anywhere else, something that is worth subscriptions. A lot of people are willing to pay to support foreign correspondents around the world, advanced sports coverage, access to premiere clips. And I think that there’s a value in someone who’s looking out for accountability in public interest reporting, by advancing data manipulation and data analysis. I think every journalist should be thinking about how they can tell the full picture, looking at all of the information available. If you shut the door on data journalism, or limit yourself on how to access data, you’re really limiting the depth of what your story can tell.

Are there ever clashes between the editorial stances of a paper and what the data says?

I think your question doesn’t even necessarily need to apply to journalism. If you look at academics, if you look at anyone who analyses data, they can tell you that it’s possible to torture a data set to tell you whatever you want it to say. You’ll read one study that says drinking red wine helps you, you’ll read another that says it will kill you. This is because people twist numbers and they will twist it to tell you want they want. But I think we’ve never been pressured to deliver a certain angle, or to intentionally twist the data.The great thing about having a data team is that you’re not relying solely on a single individual  a team requires, for us, a peer review. Each of us check each other’s processes, we really do make a moral and ethical decision whenever we’re looking at it. We try to be open and challenge each other if we find ourselves if we going down a certain angle, or not doing something as robust as it should be. The classic example is how we treated the 2015 General Election  we rejected the polling data that was in front of us  no other paper did that. It wasn’t robust, the margins were too wide, the data was skewed. That couldn’t have happened if it was just individual people going after a story.

What is more valuable, open data or freedom of information?

If there was truly open data, you wouldn’t need FOIs. If truly every government body and every organisation that is public, opened their data, you wouldn’t need to do that to begin with. The fact that FOI is under threat is a travesty, and it’s absolutely unacceptable, because this is an affront to a public service. This is a right being taken away from citizens. But if you look at the source of the problem, it is that the data isn’t open. It’s the fact that public information should be easily accessible and it should be able to be accessed. My argument would be that open data is more important, because it is the bigger picture that encompasses FOI issues. But, of course, I wouldn’t say that FOI doesn’t matter  it matters a lot. It was created because of the lack of transparency and the lack of openness. But hopefully we can get to a space where that won’t really be necessary.

Is it difficult working for both The Times and The Sunday Times, which are competing papers?

We’re the only editorial team that does this. There’s no one else who has a data team that works across two titles. It’s kind of like contracting, in that sense, but it doesn’t feel like that here, it definitely feels like two separate titles. We’re quite lucky that there’s very different focusses on what we do for each title  what we can bring to them. But at times, there’s obviously data that both titles will want, and it would be quite silly to replicate our work. But I think we’ve been finding a good balance in how we share that. Luckily, the way that data journalism works across the board is that it’s quite an open space and an open community  The Guardian, The FT  I know the editorial teams across the board here. Most of all try to open up our data. If I did something for The Times, it would be quite natural for us to open up our FOI requests and the data on that story. That’s what is quite unique about the data community. But it is challenging.

What do you want The Times data team be known for?

I’d love to expand my team even more as I get more resources, and as that’s allocated to us. Basically: I want our team to continually be breaking really great stories, and we want to be doing it in a way in which you couldn’t be doing without computing. Our team is really is brought in to be an investigative team, and we find our best use is when we are doing advanced algorithms, machine learning, modelling  when we’re handling big data, doing things that a human really couldn’t do without computing. That what I want to be known for. We’re still kind of working in an area in which we’re doing some journalism that other journalists could do, so I’d like it to really move further along that line. Doping is one of the biggest examples, but obviously we’ve done a lot of stuff on charity finances, on footballers’ accounts. I’d like to continue that, and I’d like us to get more into visualisation  our team doesn’t do enough due to resources  and I want to focus on stories. But also I’d like to help contribute to the data community and to this paper about creating those journalists that are empowered to be data journalists themselves.

This interview has been edited for clarity and brevity.

Lunch with the IHT: John Burn-Murdoch

Financial Times data journalist John Burn-Murdoch

At the tender age of 27, John Burn-Murdoch is one of the leading young lights of data journalism in the UK. His brief career to date has already taken in The Guardian, The Telegraph, Which? Magazine, and The Financial Times, where he’s been working in a coveted data journalist role since 2013. Raised in Yorkshire, Burn-Murdoch also channels his passion for spreadsheets and statistics as a visiting lecturer at London’s City University, sculpting the next generation of data enthusiasts. On a crisp December afternoon at Borough Market, he talked to Peter Yeung about the issue of objectivity in data, the risk of cronyism in the data journalism community, and how the FT are unique.

How does the FT differ from other publications?

I would say our newsroom is more numerate than most. That’s not to say everyone has a maths degree, but we have some people that have previously worked as analysts in banks, for example. That means a lot of the day to day data journalism, the quick-fire stuff, is handled by the reporters without them even thinking about it. You could say that a lot of the numerate data journalism that comes out of the FT won’t even come past our desk  it just happens. That means that those of us on the data, stats and interactive teams are afforded a bit more time to dig deeper into things. We might have more of a week-scale publication schedule, with some quicker articles in between. Whereas places like The Guardian publish two or three pieces on the data blog on any given day. There are different ways of doing things, but most of it for us is having the capacity to take a little bit more time.

How did you get into data journalism?

I never really thought about journalism until the third year of my undergraduate geography degree. I was a bit disillusioned with the course, and needed to do something extra-curricular. I started working on the student paper and really enjoyed it. I didn’t even know data journalism was a thing then. My first taste of professional journalism was doing some work experience at The Guardian during the London riots, because they were suddenly looking for lots of people to come in and do some research. That was inherently data-related work. Then I started a Master’s in Data Science [the first in the UK] at Dundee University, but I only did the first term of that because it was impossible to fit in, since I was working full time. It was distance learning, but there was also a four week period of intense sessions in Dundee. I absolutely loved it, but it proved too much of a struggle with time.

Why are you a lecturer at City University?

I’d say for two reasons. Number one, as trite as it sounds: giving something back. When I was studying at City, James Ball was lecturing at the same time as being a data journalist at The Guardian. And with something like data journalism, which is quite a rapidly evolving field, often it’s better to have someone who’s actively involved in the field teaching it. City got in touch, and there weren’t many data journalists in London to be honest, and I was obviously one of the one’s they knew, and it sort of ran from there. The other bonus for me is that it keeps my own skill sets ticking over. I kind of feel like everyone wins.

The data journalism community is quite tight-knit. What are the advantages and drawbacks?

I think it’s mainly an advantage. There are obvious drawbacks in terms of cronyism and when people are interviewed for jobs there’s always a temptation to hire the people that are familiar. But I think there are big advantages in terms of collaboration: digital journalism as a whole, and especially it seems anything where data analysis and web development are involved, seems to be inherently very collaborative. The whole concept of open source is about riffing on other people’s work, taking something someone else has done and adding to it. That collaborative spirit is a massive help. Without it, we wouldn’t move along as quickly. But as a counterpoint to the cronyism, because of the skill sets now required we are now seeing a lot of people from outside of that bubble. If anything, data journalism is less cronyistic than journalism as a whole.

With its history in computer-assisted reporting, data journalism has tended to be focussed on investigations. But should there be more quick, reactive data journalism?

There are obviously lots of cases where you can do good quality data journalism very quickly. Alberto Nardelli is one of the best at using a quantitative mindset and skill set, but with the breaking news agenda. But, having said that, I think inherently the best data journalism, if you judge it in terms of the level of analysis, and the ability to find a news line that other people don’t have, takes time. Quick-reaction pieces can only be done if you spend a hell of a lot of time familiarising yourself with your beat, and building up your data sources. It’s definitely possible, but to be knocking out really top level data journalism multiple times a week is really difficult.
The Financial Times' London office
The Financial Times office in London (Image: FT)

Is data journalism more objective than other forms of journalism?

Like those that have answered it before me, and probably much better, I would say it’s not necessarily more objective. It certainly can be. Data journalism can be more objective than vox pop journalism, purely because of things like sample sizes. When you’re trying to extrapolate and talk about national or global trends, you can be more objective. But there are issues with data quality, and issues because your starting point is always a question you want to answer. There are few journalists of any type who start with a completely naive position. Some people might have an agenda even though it’s a completely unconscious one. It’s very difficult to go in completely blind.

How do you establish the line between pushing an agenda and finding a story?

The obvious one is talking to people in the know, especially those who you think might disagree with you. If you set out to ask a question of a data set and you go to an expert who has already written extensively with the same angle as you, it won’t help much. Even before that, you can do your own tests by interpreting the data in different ways, making sure there aren’t any other counter-explanations in there.

Who is doing the most interesting data journalism right now?

That’s a really tough question. Obviously, the FT. No, no: all sorts of places. There are some obvious ones such as the New York Times, which does pure data journalism and visual journalism, constantly raising the bar and doing fantastic stuff. Berlin Morgenpost won the Information is Beautiful Award for being the best data visualisation team, and they do some amazing stuff. Pro Publica, with their data-driven investigations, do incredible work. Bloomberg have been doing some amazing visual work recently and the same goes for the Wall Street Journal. The good thing is that I’m having to think a lot harder than I would say five years ago, when I could have reeled off two or three and there wasn’t anyone else.

If you could lead your own data team, what would it be like?

I’ve never actually thought about that, it probably speaks to a lack of ambition. Personally, I like the idea of a team of specialist-generalists, so people whose skill is both technically and in terms of subject matter having interests and skills in all areas. Kind of like it is at the FT now – one week we are working on climate change, the next it’s Boko Haram terror attacks, then maybe it’s something about the global oil trade, and then something on tennis. You want everyone to have a base level of technical experience, but it’s always nice when people are pulling in their own directions to a certain extent. For me the team would be two-thirds coming up with ideas generated internally, and the other third doing amazing collaborations with other parts of the news room. Very roughly speaking, that’s what I’d look for.

 

This interview has been edited for brevity and clarity.

Nicolas Kayser-Bril interview: ‘It’s important to institutionalise data practices’

In 2008, Nicolas Kayser-Bril, a young graduate in media economics, fell into data journalism by chance because he could code simple stuff. He began his career by publishing stories with Le Monde and The Post ( the previous version of the Huffington Post in France). In 2010, he was part of the team at OWNI, a French digital think tank, that analysed the Afghanistan war logs. He is now his own CEO at the data-driven agency Journalism ++The highly accomplished data mastermind talked to Cristina Matamoros about the state of data journalism in France.

How did data journalism come to be in France?

There was a story of major importance that was run by The New York Times, The Guardian, and Der Spiegel, and a lot of media outlets realised they were left out of the story because they didn’t have people who could read SQL files. At OWNI, there were people who could do that, so I wrote the French version of the story with Pierre Romera and that’s pretty much when people realised that data journalism was a thing. 

Which company pioneered the usage of data journalism in France?

In France, it was definitely OWNI. I don’t think any newspaper or news organisation in France has made much progress in data journalism. Lots of things have been tried like Les Décodeurs from Le Monde, they’re a fantastic team. At Libération you have a new team, at le Parisien they have something as well. You have great things going on everywhere, but I don’t see any real data journalism team in the sense that you don’t have developers or designers as official teams as you see in Switzerland, Germany, and pretty much everywhere else in Europe, you don’t have that in France.

Why isn’t that the case in France compared to the UK?

In the UK, it’s the same situation as in France, in my opinion, in the sense that you don’t have news organisations driven by profit – the Guardian and the BBC are different – but other news organisations don’t see a return in investing in research and development. And this explains why you don’t have teams in France like you might have in Germany. This being said, there are many more interesting things in London than in Paris. One reason for that is because people who studied humanities drive journalism in France. So you couldn’t find a statistician in a French newsroom. So it’s much harder for French media.

If you were to direct the editorial team at Le Monde, what steps would you take to develop a data journalism team?

I wouldn’t, because the owner of Le Monde is not interested in profit. That said, creating a data journalism team is pretty easy; you just need a project manager, a journalist, and a designer, and have them work together. So it’s not that hard – it’s just that the French managers haven’t done it yet.

If you look at the ownership of local newspapers in France, you realise huge corporations mostly own them. And they have no interest at all in innovating journalism. What they really want is for the newspapers to do as little investigation as possible.

What is the advantage of doing local data journalism?

Nothing specific – it’s the same as doing data journalism at the local or national level. It allows better and more efficient reporting.

You have a lot of brilliant people in France, so you just need to find them and provide them with an environment where they can try things out.

And managers need to understand the need for investment in promising fields. But as long as these two conditions aren’t there, nothing is going to change.

Malcolm Coles interview: ‘news organisations don’t experiment enough’

The eyes of the British media are watching Malcolm Coles, the Telegraph’s Director of Digital Media. He is at the heart of the huge disruption taking hold of the newspaper industry and the battle to make journalism sustainable. Coles launched Trinity Mirror’s nimble data project Ampp3d in December 2013 to widespread acclaim, and now he is tasked with transforming The Telegraph into “a digital-first media newsroom”. Serene yet steely, he spoke to Peter Yeung in a colourful modernist corner of the newspaper’s Victoria headquarters.


Could you explain your role?

One is improving the standard of our digital publishing. So, trying to make how we write about things more suitable for the digital age in terms of interactivity, visuals, and background explainers. For instance, there’s an editorial development team that reports to me, who work on exciting projects. There’s a new formats team, who are tasked with new ways of displaying things – a whole new ecosystem of explainer cards and timelines and responsive infographic grids are here.

For the other half of it, I manage the teams that focus on audience. 90 million monthly unique users is the new normal for us every month, and we’re heading for 40 million UK unique users. Like a tree falling in a forest, if there’s no one around to hear it, does it make a noise? Likewise, journalism is only really good journalism if it has an impact.

Did you always envision a career in journalism?

No I didn’t. In fact, I decided that I didn’t want to go into news journalism when I left university because I couldn’t quite bear the idea of knocking on people’s doors when their offspring had died. For all that it is – a very important part of local journalism – it still wasn’t really for me. After working at a number of other places, such as [consumer advice charity] Which? and [website consultancy] Digital Sparkle, and writing about newspapers on my blog, which involved the Daily Mirror trying to sue me for libel, then they rang me up and said: “You’re so clever, why don’t you come and work here instead?” So, I did.

Telegraph Media Group's newsroom in London (Photo by Lucas Schifres)
Telegraph Media Group’s newsroom in London (Photo by Lucas Schifres)

Where is The Telegraph going will the metered paywall be around forever?

The Daily Telegraph has never had a bigger reach than it has right now. More people read it than ever before in history. But we all know that digitally it’s hard to make money. There are lots of people competing for that money, including new platforms like Facebook and Google. Martin Sorrell, CEO of advertising giant WPP, has come out and said he thinks paywalls are the way for publishers to go. On the other hand, you have The Sun coming out from behind its paywall because it couldn’t make it work.

The Guardian gives its content away for free because it doesn’t think it can get people to pay for it. Read into that what you will. The Telegraph, in some ways, has a cap on engagement, but despite that we’re still growing. I think we’re up about 20% year on year in terms of unique users and I think we had record traffic numbers this year, but May 2015 was a bit of an anomaly with election taking us past 100 million. The Guardian has twice as many journalists and loses lots of money every year, The Telegraph is profitable. I think we probably punch above our weight in respect of all that.

October’s ABCs show that The Guardian are now at 8,370,243 uniques (+11.28%) and The Telegraph at 4,419,480 (+0.11%)

The Telegraph are currently hiring a new data editor – was this your decision and do you think data is a requirement of the modern journalist?

I think I probably filled out the document. The core of journalism is still itself the same as it ever was, but how you find some of those things is a bit different and how you display them is very different these days. There have been a number of new hires this year. As I say, we set up a development team on the editorial floor, who are busy building reusable format stuff and one-off interactives. But yes, data journalism is a different way of uncovering stories and thinking in different ways to visualise them. I’m sure data will increase at The Telegraph, but it is just one facet of what we do.

I still think you will get specialists in many areas, on the other hand, general journalism is a bit more of being able to everything these days. People are expected to self-publish, to think about SEO and social, and the home page. They’re expected to find all different bits like galleries and videos and assemble them. So there’s a lot more thinking about how to best display your story online than there was 50 years ago with the old workflows.

Why did Ampp3d, UsVsTh3m, and Row Zed not work out?

You’d have to ask The Mirror that. There are obviously expensive ways – as the Mirror said at the time – of doing things. On the other hand, it would be a shame in 2016 if we don’t have new ways of getting journalism across. News organisations need to invest and experiment with different ways of doing things, especially as devices change. One fly in that ointment is things like Facebook Instant Articles and Google AMP, where exact control of how the web pages work in those environments is ceded to the platform, which might limit the sorts of things you can do. The latter limits you from having “arbitrary” Javascript, as they call it, which is an impediment to things like interactives.

A graphic made by the now-defunct Amp33d
A graphic made by the now-defunct Ampp3d

 

Is the lack of popular, tabloid data journalism an issue?

There are lots of stories buried in the data. It’s important that data journalism happens, because otherwise you don’t find out these things. But I think there’s more data journalism now than there ever has been. It depends what you mean by data journalism really. If you open up a tabloid, there is usually some sort of infographic in there with numbers. Trinity Mirror still has a data unit though, and they are still working on data journalism all the time.

SEO and Social are now up to 70%. Will it always be so important?

Parse.ly did announce that for their network of publishers social overtook search. But The Telegraph still does have a very strong line of traffic to its home page because people want to know what our view on the world. They’ve been significant drivers of growth for most publishers over the years, and they’re not going to go away. Obviously, Google and Facebook have expressed dissatisfaction with how web pages render in their environments. But both those brands know that people go to their platforms in order to find things out — there’s a reason people follow news brands of Facebook. It’s not in either of their interests to stop news being findable to the scale it is today, but we can also bicker about the share of the advertising pie we get.

Where will The Telegraph be in 10 years?

At the start of this year, I assumed a responsive website was a terribly important thing. But now with mobile web pages we are heading onto Facebook Instant Articles and Google AMP. Ten years is a very long way to look ahead. I imagine there is a bunch of people at school now, who by the time they get to the end of secondary school, virtual reality will be second nature for them. For me, it will always be a weird, alien thing. I’m sure there will be virtual reality ways of accessing digital journalism and I’m sure we’ll spend a lot of money working out how to do it right and then no doubt some virtual reality platform will come and attempt to aggregate us all. Let’s hope we’ve all learnt our lessons from the mobile web for that. There’s no way I’d have predicted the end of 2015 at the beginning of the year, so I’ve no bloody idea what’s going to happen in 2025. But I’m sure cat GIFs will still be important.

How to write a data story with bad data

The General Election is the most important media event of the year. Its a chance to earn your stripes as a journalist and get some page leads in your portfolio.

This was the thought I had last week when I was on work experience with the Times’ Redbox supplement. If you’ve heard of Redbox, you’re probably also aware of their strong emphasis on data driven journalism. Redbox receives exclusive polling data from Yougov to keep them ahead of the curve.

I wanted to prove that I could write strong data stories. I had already been working on a feature about young candidates in the election, and thought it could work as a data story. Only problem is, the data on parliamentary candidates themselves is inconsistent as hell.

I used a website called Your Next MP, which had a spreadsheet of every candidate running in the election this year.

The data was pretty bad. There were huge chunks of information missing, no guarantee on the accuracy of the data and another journalist mentioned “crowdsourcing” when the website came up in conversation.

What should you do in this position? Do you give up after days of research and interviewing? Do you try and find a different angle that doesn’t need data?

It might not look exactly how you thought it would from the beginning, but a bad or incomplete dataset doesn’t have to mean the story is dead. there’s lots of ways you can tell a strong, accurate data story that don’t involve perfect data.

 

Clean it

My battle with the data
My battle with the data

First thing’s first, you can’t do anything with a messy dataset. You should clean what information you do have so you at least know the extent of the problem. To view how much data you have to talk about, right-click on the column you’re looking at (in my case this was “birth date”), select “filter” and select the values you want to keep. Although this is not a definitive list it gave me an large cross-section of young candidates to research and talk about further in my story.

You might find that the data your left with is incomplete but it still paints an interesting picture and backs up what you already know. Equally, you could find that you simply do not have enough information to make your story data-centric. Either way you need to know. Make a note of the change with each stage of the cleaning process so you know how inconsistent your data actually is. Then you can make an informed decision on how important a role the stats will be able to play. You can find out more on how to clean data from this handy guide, or in this video.

 

Cite your sources.

Crowd-sourcing site Your Next MP
Crowd-sourcing site Your Next MP

You should be doing this anyway, but it’s even more important if you’re worried about the accuracy of your data. Take yournextmp.com as an example. I was fully prepared to analyse information on incumbents on this site for a data story on same-sex marriage. That is until I spoke to Roger Smith at the Press Association:

“The information is gathered through crowdsourcing, which makes it really rather unreliable. There may well be quite a few last-minute withdrawals and so the data’s accuracy can’t be guaranteed.”

This looks like a death sentence for your feature, but it’s actually something you can work with. As long as you know and acknowledge that the dataset you have doesn’t tell the whole story, you still have the basis for an insightful piece that is enhanced by data.

 

 

Don’t analyse too deeply

Atrapitis

Do not be tempted to over-alter the dataset to find an angle . The chances are that anything you do to the set beyond just cleaning will create further inaccuracies. Abandon any grand ideas you had of merging with polling data or finding average ages. It’s not going to work. Here’s an example of a dataset I had to work with which had the same issues.

Remember that the data will not tell the whole story, but you can look at it and analyse it to get some interesting statistics to illustrate your bigger point.

 

 

 

Avoid misleading visualisations

Here's one I made earlier...
Here’s one I made earlier…

For the same reason that you shouldn’t be analysing the data too deeply, you shouldn’t be putting the information you do have into a graph. Graphs and maps assume that the data is gospel. If you can’t guarantee that then any visualisation is misleading and uninformative.

 

Focus on people, places and personalities

using candidate data to make contacts
using candidate data to make contacts

Your data is not going to be the hook of a ground-breaking discovery, but it’s actually very rare for data to make front page news. Instead, you should be using your data as a starting point to explore different areas, people and trends. Say your story is about candidates under 20 running in the election, and you can only find 8 people who fit the bill, even though you know there’s more. Use the number of candidates you have found as a contact list rather than the story, and before you know it you have some interesting insights into the political careers of teenagers.

 

So there you have it. Use this guide any time you have a dataset you feel very uncomfortable using as the basis of a story, or even if you’re new to data journalism and don’t know what to analyse. You don’t have to be a statistician to create great data stories.

 

Sociotope brings your online identity to life

A mass of multi-coloured tentacles against a grey-blue backdrop

While browsing data visualisations on Pinterest the other day, I came across an interesting-looking tool: Sociotope, a social media experiment which takes the data people leave behind in social networks and turns it into an interactive data visualisation.

The free-to-use web app works with Twitter, Facebook and soon Google Plus. It uses your data to build a “virus”-like creature with one tentacle for every post you’ve made, or post that someone else has involved you in, up to a maximum of 150 (though you can choose to load more). The colour scheme is taken from your profile picture, and the length of each tentacle varies depending on the length of the post. The more the tentacles move around, the more people have interacted with that post – providing a slightly bizarre but effective overview of your social media presence.

 A screen capture of me exploring Sociotope and using it to visualise my Twitter activity

Sociotope is functional, but also fun and interesting – you can use your cursor to spin it around in the three-dimensional space, and watch as the tentacles flop about. You can click on each one to see details about that post, although with so many tentacles in the way, it can be difficult to hit the exact one you’re aiming for.

Sociotope also provides a few options for analysing your social media presence, including sorting posts by time and by author. Its design is geared more towards visual impact than straight-forward analysis; but it’s effective as a visualisation and fun to play with, and could serve as an entry point for more casual users into analysing their social media presence, rather than only appealing to professionals, like most analytical tools.

A visual metaphor

Stefan Wagner, the designer who created Sociotope, says he wanted people to gain an understanding of what they leave behind online:

If you browse websites, data is collected about you – lots of data. I think the average user doesn’t ever glimpse how much data that is actually collected … these kind of exceptional visualisations, they gain people’s interest, and they will be interested in this viewing this data and what lies behind it.

Stefan describes Sociotope as a “metaphor” that represents people’s social media activity and their social relevance. “I always liked connecting data visualisation to some sort of metaphor – I like working with metaphors to convey information about something. The idea was created to make a data visualisation of social media and put it in some sort of other form, to shape it differently, so that the viewer would learn something else from it.

“I really hope that people are using it to analyse their own presence and maybe the identity of others. Because social networks, they’re all about social interaction, and I think it’s really important for people to realise how they use this kind of social media, how they interact with their friends, and how deep this interaction goes.”

Does he think that this is a role that data should be playing – in helping people realise these things about themselves? “For me, it’s the only way data should be used. Of course big data is used to do advertisements and stuff but for me, the interesting thing lies in analysing behaviour and getting into how people use this kind of media.”

A colourful Sociotope visualisation with a few tendrils extending out towards the words "tweet", "reply", "tweet with media" and "link"
Sociotope can break your online presence down by type of post and whether or not it contains media | Stefan Wagner / andsynchrony

Sociotope also provides an unexpected insight into how the internet has developed over time and how users’ social media presences have changed with it. By loading posts back far enough, you can play them as a time-lapse which shows the evolution of a person’s social media presence over the years.

“When I started to build the project,” says Stefan, “I saw that in 2009 or 2010, people were writing a lot more text, but now they restrict themselves to posting photos or one-liners – just a few words. People tend to not write so many things any more; they more tend to post photos or videos.

“You can read it out of the visualisation. [Similarly], when you look at websites, how they are structured and how they try to gain attention, photographs or images get a lot more space these days than they did two or three years ago.”

Generating Utopia

Sociotope isn’t Stefan’s only project which uses data visualisation to give insight into how people use social media. In 2013, he created ‘Generating Utopia’, a real-time visualisation of social location data using the social platform Foursquare.

It takes a map of an existing city and alters the topography based on a person’s Foursquare check-ins, elevating the areas where a person checks in the most, to emphasise their importance. The locations are connected by a web of neon lines in primary colours: red for work, blue for recreation and yellow for transport. The overall effect is a dramatic, futuristic cityscape.

“People like to represent themselves from their best side, in social networks,” Stefan explains. “So when they check in somewhere, it’s not like the doctor’s office or something; it’s some awesome place. So people will build up a utopic vision for themselves, and I wanted to build utopian landscapes from the data.”

A bird's-eye view of a cityscape with several buildings perched on top of high mountains, with lines of red, blue and yellow light winding their way around the topography
A still from Generating Utopia | Stefan Wagner / andsynchrony

“I really love provoking people by showing them data in a different way. I like using metaphors and images, strong images, which provoke people’s imagination to make them build up a sensibility towards what data means and how much data they produce. I think it’s really important.”

Stefan says that he would like to see more people creating images and ideas from the data that lies behind a person’s online presence. “Every image which is created helps shape this future idea of how data should be, or how social networks should work. I can only motivate people to try to visualise data.”

Interview: Hera Hussain on the need for open data

Hera Hussain is Communities and Partnerships Manager at OpenCorporates, the world’s largest open companies database. When she isn’t busy organising hackathons and liberating corporate data from across the internet, she works with the social entrepreurship movement MakeSense and empowers women to achieve independence through Chayn. She spoke to Interhacktives about her experiences with open data, its importance, and the role that journalists should play in making it more accessible.

A basic right

“I initially misunderstood it,” Hera says, of her first encounter with open data as an organiser of WikiMania, an annual event focused on wikis and open content. “Like many other people, I could only see some applications of open data. For example, I thought it would be really useful if government posts statistics on crime. What I didn’t realise is that the aggregated statistics aren’t important. Anybody can come up with those numbers; the important thing is the underlying data. It’s not just about how many knife crimes have happened, it’s more about when they happened, where they happened – the little details.”

“Data should be a basic right,” she goes on. “And that wasn’t very clear to me until I started working for OpenCorporates.”

What does being Communities and Partnerships Manager for OpenCorporates entail? “It’s my job to make sure that the data held by OpenCorporates is used for social good – by journalists, by NGOs, by citizens, by other open data organisations. My job is to make sure that happens and also make it easy for people to contribute open data.”

One of the ways people can get involved in contributing open data is through taking part in #FlashHacks, monthly hackathons where anyone can come along to liberate and map corporate data or write bots that will convert the data into accessible formats.

Hera Hussain wearing a red T-shirt with the word "FlashHacks" across the front, against a crowded backdrop of hackathon attendees wearing similar T-shirts and sitting at tables
Hera Hussain at a FlashHacks hackathon

 

The importance of open data

Believe it or not, the UK is one of the world leaders in open data, alongside New Zealand. “Especially company information,” says Hera. “Our Companies House is really open to suggestions from the NGO community and the open data community, and they’ve done great work in opening up the database. The government has a really pro-open data stance which makes it possible for this all to happen.”

What is the most important thing about having open data? “I think it’s the fact that it exists,” Hera says. “People always say open data is very elitist. Only people who can work with data can use it. But because I think it’s a right, the fact that it exists is really important, because there will be somebody who can use it. We can leverage their knowledge to make things better.

“There’s always somebody out there who can apply it, and while there’s a big gap in terms of understanding data, I think eventually that will be filled. You can say the same thing about engineering, you know – engineering’s really elitist, because not everybody can understand how machines work or how buildings or materials work. But those who know how to make it work make it work for everybody.”

Making an impact

Ideally, she says, more people should be educating themselves about data and what it can do; but it might take a different approach from the data community to generate more interest in open data. “The problem is that things that make an impact on people are stories. I think we need more of that and I think the whole open data community is realising that, is trying to create a storyline of how it can be applied and how it is being applied.”

Is this a role that journalists should be playing? “I think it’s a responsibility. I think you become a journalist because you want to report on something that’s true, or you want to investigate something that you don’t know about. In both cases I think preferring open data over proprietary data is really important.”

Of course, the right data isn’t always available for journalists to tell the stories they want to, but Hera is optimistic that this will improve as the open data movement and data-driven journalism gain momentum. “So many times I’m contacted by journalists who want to work with open data and have a very strong hypothesis that they want the data to prove or disprove, but the data’s not available, so there’s no way to do it,” she says. “I think that can be quite frustrating. But I think the new data-driven tide in journalism is interesting, and I think these things are going to be much easier to do in the future. As we liberate more data, there’s more pressure on governments to release data, more pressure on companies to release data in the right formats, so I think the future is promising. It’s just that there’s a long way to go before it becomes easier for journalists.”

Change is coming

What does she think is currently the biggest obstacle to making data more open? “Two things, from OpenCorporates’ perspective: one is that we need so many more bodies of volunteers to actually scrape the data sets … And we need to actually find them as well. Finding data in itself is a big, big problem. Some people say that it’s almost like a self-fulfilling prophecy, because as governments and companies are realising that people are making use of this data to do things that they might not like, they start closing them down. So many corporate registers have closed down in the last year.

“There’s not enough incentive for them to release information, so we need to ramp up the pressure on them. But at the same time, there’s something there which they don’t want to get out, which is why it’s not happening.”

“I am glad that there is a conversation happening, and journalists are a big part of it. They put pressure on governments and companies to be more transparent.”

But things in data aren’t all doom and gloom. As previously mentioned, the UK as a whole has a positive approach to making data open, and next year this will improve even further with the launch of a central register of beneficial ownership for UK companies. It will mean that companies have to disclose information on anyone who controls more than 25% of the company’s shares and voting rights, starting in April 2016.

“I think we will definitely see a difference [in the amount of open data] starting from next year,” says Hera, “because the beneficial ownership information will be open in the UK. Other countries have said they will open it as well. For the next two or three years, I think we’re definitely going to see some change.”

Billy Ehrenberg on data journalism’s future and the skills you need

Billy Ehrenberg, ex-Interhacktive and data journalist, has spent the last year working on new data-based projects with City A.M.’s expanding online team.

I caught up with him to ask what his role involves, and what he sees as the future of data journalism.

In his average day, he admitted that he doesn’t do as much data as he’d like.

“There is a common misconception that graphs in stories means that it’s data – but I try to get at least one data piece done a day.

“Some of what I do is trying to find a story in the numbers, but often the story is quite obvious or easy to tease out, and I need to use visuals or explanations to make it accessible and interesting. To do this I use a few different tools.”

“Excel, Google Sheets, QGIS, CartoDB, HighCharts, Quartz Chartbuilder, Outwit Hub, Illustrator – each one has their advantages”

Billy has several different favourite data tools depending on the job at hand. For example, he says he usually prefers Excel for cleaning datasets.

“I’ve used Open Refine a bit, and that’s certainly worth getting into. Excel and Google Sheets have a bunch of functions that let you pull data apart and whip it into shape – so how useful Excel is depends mostly on if you’re boring enough to have fiddled with functions for days on end.”

data journalism at city am

“Fake data”

On what he sees as the future of data journalism, Billy reckons that “it will naturally divide between real data and fake data. You see some people who do things like not adjusting historic financial data (even film revenues) for inflation because they are in a rush or just don’t realise they should. That’s a dangerous thing: people can see a graph or chart and think that what it shows is fact, when it’s as easily manipulated or screwed up as words are.”

“That’s a dangerous thing: people can see a graph or chart and think that what it shows is fact, when it’s as easily manipulated or screwed up as words are.”

“I think you’ll get two sets of people: those who do not do a lot else, with big skillsets like coding, stats, cartography and programming, and those who have to rush out faux data for hits.”

The next ‘hot topic’

Billy told me he’s not sure what the next hot topic is, but he think it’ll be related to coding – “maybe it’s a cop out, as it’s nothing new.

“People wonder if it’s worth coding if you’re a journalist, and even if you are a journalist if you code. I’m obviously pro-learning.”
data journalism at city am

Data principles

“It’s really important to try not to mislead people. Graphics are easy to use to manipulate people. The more complex they are, the more likely you are to mess up and the less likely it is anyone will notice, even if it changes something.”

“Visualising ethically is important too: even the colours on a map or the extents of an axis can make a change look hugely dramatic”

“I try to let the data tell the story as much as I can and if I don’t like what it’s saying I won’t change the message.”

When asked what data-related skill he wishes he could master, Billy said: “it’s got to be D3. It’s so difficult that I get a real buzz out of solving something in it, even if it’s taken hours.

“Probably learning JavaScript is the best way to crack that nut. It’s a work in progress.”

I asked YouGov’s Stephan Shakespeare about Miliband’s non-dom promise and whether this is the most data-driven election yet

YouGov founder and CEO Stephan Shakespeare

When you get the chance to speak to the CEO of YouGov on the day Ed Miliband announced his promise to abolish non-dom tax status four weeks before a General Election, you just have to ask the question.

Could this be a turning point for Labour?

Will Miliband's non-dom promise be a turning point for Labour? Emojified.

The answer is no, not really. And Stephan’s been following election campaigns since before I was born, so I’ll side with him on this one. However, Miliband’s non-dom prom is “the biggest hit of the campaign so far,” he says. The policy is:

“…symbolic of messages that are out there already. Labour’s strongest card is ‘We care about people like you, the other guys care about the rich.’

“And that has a sort of factual basis – I don’t mean in terms of actual caring, I’ve got no idea if Labour cares or if the Conservatives care about this or that. But what I mean is that it is the Conservative belief that you have to make the economy work and you have to make capitalism work in order to do that.

“And therefore they [the Conservatives] are always going to support policies that do not hurt what they would regard as the wealth-makers. To those who don’t buy that analysis of the economy, and of society, they will look as if they care about the rich.”

I like YouGov. I read Stephan’s column in the Times Red Box bulletin, about the findings of exclusive polling. But I was interested in what YouGov’s unique selling point is.

YouGov conducts polls with greater reliability

The pollsters have been collecting data from their panelists since its launch in 2000. And what this throws up is a valuable backlog of information so we can analyse trends in people’s opinions.

“The huge value of our methodology is that nearly every other pollster only gets the information that they’ve asked for in that poll, whereas we have this huge background of information, including for lots of them, what they voted at the last election. So we can get very exact in what is real change.”

With the rise of the stand-alone data-driven election website (Red Box and May2015 are a couple of examples), it feels like we’re witnessing Britain’s first truly data-driven election. Now that could be made into a nice headline but when compared with our cousins across the Atlantic, we’re not quite there yet. And here’s why:

Money emoji

Money. The biggest reason we haven’t seen a data-driven election in the UK is because there’s much more spending in the US.

When you’re spending a lot of money on political advertising, it’s hardly surprising you’d want to know how best to spend it. That’s what’s really behind a data-driven election, says Stephan.

With the Obama campaigns, we saw “really high quality predictions of what to expect from each household, what are the likely issues that matter to them, which are the ones worth knocking on, which are the ones worth ignoring, which are the ones that might give you money, and therefore be approached by mail or whatever, who are the ones that you can get to vote on the day — that’s a data-driven election.

“You’re using whatever data you have available, which in the early days is quite generalised, but as time goes on becomes more and more granular, to allow you to treat each individual voter with some foreknowledge.

“You’ll never know exactly what they’re thinking but you’ll have some foreknowledge which will therefore help you – and this is the critical point – to use your resources in a smart way. That’s a data-driven election and that’s not happened yet.”

As campaigning becomes more atomised in its approach, spin-doctors will be losing a lot of sleep over how representative the polls really are, and whether their outcomes will hold true come election day.

Do we place too much trust in polls as predictors?

“You’ve always got to fight to win, right? So you should never give up if the polls tell you that it’s hopeless – or that it’s already in the bag. But you should always use the best evidence you’ve got and take that as your starting point.”

Winning ≠ forming a government

Winning an election is entirely separate to forming a government afterwards. Communicated in emoji.

Winning and forming a government no longer go hand in hand. People are going to have to vote tactically by considering which collective of parties are most likely to be able to form a government post-election.

The chief driving factor in this election for Stephan is whether people will work out what they need to do in order to get what they want, in time for the election.

“Most Ukippers prefer a Conservative government to a Labour government. If they work out in the next four weeks that they are harming the chances of a Convervative-led government and helping the chances of a Labour-SNP government then they may change their minds and vote to get their realistic aim.”

Is immigration the driving force behind everyone’s vote?

No. Even among Ukip, says Stephan, immigration isn’t the most important issue. YouGov asks the question in two forms:

What do you think are the most important issues facing the country?

And:

What are the issues that matter to you personally?

For the first question, immigration comes very high on people’s agenda, as well as public services and the economy. But for the second question, immigration drops way back and it’s the economy or public services that people care most about.

“My explanation for that is that immigration is a real annoyance to people who are unhappy about their economic situation or whatever it is, their lives, where they live, and they put it in those terms.

“But the real thing they care about is the actual nature and quality of their lives, which are being driven by these other factors much more. And they end up feeling very anti-establishment, and immigration is one of those issues the establishment, in their eyes, can’t fix.”

Featured image credit: Policy Exchange/Flickr
Emoji provided by emoji.ink

Register to vote. The deadline is soon

The deadline to register to vote is 20 April. It’s really quick and easy. So, what are you waiting for?

Interview: You Gov’s Joe Twyman on polling 2015’s unpredictable election

Picture of Joe Twyman on the One Show.

When I met You Gov’s head of political and social research for Europe,  the Middle East and Africa and founding director last month, general election preparation was in full swing. Joe Twyman was a very busy man.

“I was on three broadcast pieces yesterday and five the day before,” he says. “That means telling the world about what people think of particular issues. Earlier in this week it was living standards, then UKIP’s policy on immigration, and yesterday it was the debates.”

Joe’s input is highly sought after, reflecting how the market research company has taken centre stage as a very uncertain day in current British politics approaches. All eyes are on their election opinion polls as everyone tries to guess who might be running the country after May 7- and that’s mainly because nobody really knows, even after the heated leaders debate last week.

“What makes this election so interesting is that for the first time in some time we don’t know who is going to be the largest party in term of votes,” Joe says himself.

“Even if we did know who would be the largest party, we don’t know how that is going to translate in number of seats.”

“What makes this election so interesting is that for the first time in some time we don’t know who is going to be the largest party in term of votes.”

The polls that they produce have become a valuable source for journalists generally as well as specifically data journalists as they try to report on a continually shifting climate in British politics. The Times and The Sunday Times have an exclusivity deal with You Gov and its polling data – its digital political platform Red Box in fact relies on them for most of its data journalism- while other publishers “piggy-back” off the results that the organisation is required to publish by the British Polling Council.

The leading party in the polls changes constantly. Only two weeks ago Labour was leading by 1 per cent, but the latest You Gov polls now put the Conservative Party ahead by 2 per cent. But does it make it hard to make predictions if voting intention is so variable?

“It’s interesting and more difficult. We’ve moved away from a two party system, and the same time the electoral system has not changed at all. It’s that contradiction between the political system and the electoral system that makes it so hard to predict,” Joe says. “It has really attracted noise and attention.”

“It’s that contradiction between the political system and the electoral system that makes it so hard to predict.”

Comparing to past elections

The 2015 election is very different to the first covered by You Gov in 2001, a year after the organisation’s foundation. Instead of a three floor office near Old Street with a slide to the basement, the small team had set themselves up in a small shed in a garden near Westminster.

The winner was importantly more certain – Joe says that there “wasn’t any doubt” that Tony Blair was heading for a second term.

Yet still it was a period was not without its difficulties for the online pollster as it was starting up when the internet was still a new and somewhat mistrusted tool. Joe describes some of the hostility – or “abuse”, using his term– that they encountered.

“We just got attacked all the time,” he says. “People said that what we were doing was ‘voodoo polling’ and you can’t do things right on the internet. We were told that if you do things on the internet, you might as well throw out all the books written on survey research methods.”

People said that what we were doing was “voodoo polling”

Yet eventually the benefits of online polling became hard to ignore. “Once you can demonstrate that you can do things as well as the traditional methods, then the other advantages of polling online start coming out.” He now teaches survey research methods at four universities across the UK.

Joe Twyman from You Gov

 

It is also much cheaper at one tenth of the cost of traditional surveying – and faster. While once it took 10 days to survey 1000 people with traditional phone call and pen and paper methods, online it now takes only 24 hours to collect 2000 responses.

Joe claims that they were battling for “five or six years” against people who claimed they were frauds before “things got better”.

Accuracy and neutrality

You Gov’s authority in public opinion though has mainly grown through its accuracy, where their polls have often got it right where others have not.

“In the European elections everyone over estimated Ukip except us,” Joe relates. “It’s stuff like that that gives people the certainty that what they’re seeing is a fair representation of public opinion.”

“It’s stuff like that that gives people the certainty that what they’re seeing is a fair representation of public opinion.”

It is also about fairness. The majority of You Gov polling work is actually in the consumer sector – 95 per cent in fact – including shampoo and dog food, and it is apparently common for companies to ring up to ask for questions asked to be swung in their favour or to get specific responses out of people.

Joe says that they always refuse these requests. “We are contacted by them two or three times a week, and we just turn round and say that we are not working with them. Nine times out of ten they then say they’re not working with us, and then usually around five to ten minutes they call back and decide to go with our questions.”

The abuse moreover that the organisation received at the beginning has not ended with the rise of You Gov’s reputation. Joe reveals that in fact their non-wavering stance of neutrality has been met with criticism from some clients, particularly during the Scottish referendum.

“As a company and myself personally, we were getting attacked on both sides. One side would be saying that they were not ahead so our questions must be biased, and the other side saying that they were not far enough ahead so we must be against them!”

“As a company and myself personally, we were getting attacked on both sides during the Scottish referendum.”

From Iraq to sex and politics

Joe’s career has taken him in multiple interesting directions, including an involvement in the first post-Sudan survey in Iraq . When the state fell in 2003, he found himself amongst a handful of his You Gov colleagues who were polling people in Iraq after Boris Jonhson when he was working for The Spectator suggested it would be a golden opportunity.

“How we seem to see it now is that Sudan fell and everything went to shit, but that’s not what happened,” he says. “There was a pause when everyone outside looking in went ‘what happens now’.”

“How we seem to see it now is that Sudan fell and everything went to shit, but that’s not what happened.”

But did he have to change You Gov’s methods for a war torn country where online polling might not be possible?

Surprisingly not. “All the methods and techniques we have learnt from Britain about internet polling were copied into Iraq. We knew we had to be smart about things and examine the social cleavages in society – what is it that unites and divides people – and then we had to sample it. It is those techniques that we replicated in the face to face interviews that we did in Iraq.”

He returned to Iraq only last year, but says the idea of doing that kind of polling would seem “ridiculous now”.

It is not just the serious issues that interest Joe. He has recently written about the relationship between the ballot box and sexual preferences for the book Sex, Lies and the Ballot Box by Philip Cowley and Robert Ford.

Could he then answer the final question that is on everyone’s minds – does a taste for BDSM really affect how people vote?

“Sadly not,” he admits. “You can’t identify voting intention by examining sexual fantasies.

“You can’t identify voting intention by examining sexual fantasies.”

“But there are some statistically significant differences between different groups of people in their sexual fantasises once you control other factors such as age and gender. “

There you have it.

How to use statistical functions in Excel

Lies, damn lies, and statistics. At least that’s how the saying goes and how the wider public feel, for some reason people tend to distrust something with numbers backing it more than if it doesn’t. Well that’s just completely wrong, but the problem is that numbers can tell two different stories from the same data.

Statistics might drive people insane, scare them, or not seen relevant.

But in this post I’ll try and explain how to use Excel for some basic statistical analysis and what it can tell us.

Disclaimer: There will be outcomes I don’t explain as they are more advanced, but they may be covered in a later. I will also use very simple data sets for ease of explanation.

Why might a data journalist want to use statistical tools?
Journalists have a lets be honest earned reputation for being scared of numbers and frankly being awful at them. But data journalists and those interested in it are a different breed.

Why we should be interested in statistics is what it tells us about our data, it is a tool to spot patterns, check reliability and ask if all as it seems to be. For a basic story this is probably going a bit far, but when handling complex data sets especially if they are financial it tells us a lot.

If you perform a regression analysis for example and the results seem odd there is a lead to explore, which you would never have found unless by chance or really understanding the subject area. In essence it is a tool that allows you to tell more about stories and find exciting new leads.

So lets get started.

Firstly you need to make sure you have the right tools. For Mac this is:

  1. Download StatPlus:mac LE for free from AnalystSoft, and then use StatPlus:mac LE with Excel 2011.
  2. You can use StatPlus:mac LE to perform many of the functions that were previously available in the Analysis ToolPak, such as regressions, histograms, analysis of variance (ANOVA), and t-tests.
  3. Visit the AnalystSoft website, and then follow the instructions on the download page.
  4. After you have downloaded and installed StatPlus:mac LE, open the workbook that contains the dat that you want to analyze.
  5. Open StatPlus:mac LE. The functions are located on the StatPlus:mac LE menus.

Statistical Functions

To start with, here is a list of the majority of the statistical functions within Excel. We won’t be covering anywhere near all of these but an explanation is provided by each.

statsfunctions1

statsfunctions2

statsfunctions3

statsfunctions4

statsfunctions5

Learn about your data

One nice thing about the Data Analysis tool is that it can do several things at once. If you want a quick overview of your data, it will give you a list of descriptives that explain your data. That information can be helpful for other types of analyses.

We shall use the data below. It shall also be used for other topics in the post.

datausing

If we wanted to get a quick overview of the variables, we can use the descriptive statistics tool. Go to the basic statistics tab in StatPlus and click on descriptive statistics. Then highlight the column containing the data, if you have checked column 1 as labels make sure to includes it. This looks like a lot. But some of these variables can be helpful. This is useful for journalists because it helps us test the validity of our data and mean we don’t go too far in trying to find a story before realising it isn’t worth our while.

dataoverview

To look at Variable #1 (Quantity sold), if you do a regression, you want the Mean (average) and Median (middle value) to be relatively close together. If your results are good you should be seeing standard deviation to be less than the mean. So in the above table, our Mean and Median are close together. The standard deviation is about 1296 – which means that about 70 percent of the quantity sold was approximately between 5900 and 7100. Not too scary so far right?

Correlation

Another good overview of your data is what is called a correlation Matrix, which gives you an overview of what variables tend to go up and down together and in what way they are moving. For example say you were looking at data which showed how something changed over time, you could use the matrix to see if the progressions are what you’d expect. This might find you a great story.

It is useful as a first look at what your data is telling you before potentially delving into regression, but to work out whether the data for a story is reliable this would be useful. The correlation is measured by a variable called Pearson’s R, which ranges between -1 (indirect relationship) and 1 (perfect relationship).

Go to the data table and the data analysis tool and choose correlations. Choose the range of all the columns (less headers) that you want to compare. Then you get a table that matches each variable to all other variables. Below you see that the correlation between Column 2 and Column 3 is 0.02156. It would be the same between Column 2 and Column 3.

Correlation provides a general indicator of the what is called the linear relationship between two variables, but it crucially you cannot make progressions. To do that, you need to do what is called linear regression – this will be covered later.

correllation

What we can use it for however is checking the outcomes are logical and within a margin of error, if not ask why? If the data set your working on suddenly changes ask questions, see if there isn’t a story. It is a tool to allow you to go beyond the obvious and find interesting stories within your data.

Some characteristics help predict others. For example, people growing up in a lower-income family are more likely to score lower on standardized tests than those from higher-income families.

Regression helps us see that connection and even say about how much characteristics affect another.

Trend Analysis

Trend analysis is a mathematical technique that uses historical results to predict future outcome. This is achieved by tracking variances in cost and schedule performance.

For trend analysis there are three ways it can be done: the equation, forecast, or trend. I will go through these three methods using the simple set. One important term to understand here is R-squared, as it gives a indication of the reliability of your data. But what is R-squared?

R-squared is a statistical measure of how close the data are to the fitted regression line.

The definition of R-squared is here. Or:

R-squared = Explained variation / Total variation
R-squared is always between 0 and 100%:

0% indicates that the model explains none of the variability of the response data around its mean.
100% indicates that the model explains all the variability of the response data around its mean.

It is important to remember R-squared cannot determine whether the coefficient estimates and predictions are biased, which is why it is critical that you must assess the residual plots. R-squared also does not indicate whether a regression model is accurate beyond doubt, it can be low and right or high and wrong.

The equation

trendanalysis1

Forecast function

trendanalysis2

Trend function

trendanalysis3

Why might this be useful?

Well if you are getting large outliers or your R-squared value is out for example it can be an indicator of an unreliable data set. For a jobs data story this could suggest that the government’s claims to a smooth system is not true.

Or if you were doing a story on incidents of piracy it could lead you to exploring avenues about reporting, hotspots, or identify key periods for further investigation. Paradoxically by going deeper into the numbers it can allow you to can further beyond them and ask the really tough questions.

Statistics keeno klaxon

Here are other types of standard trends, which may be touched on in a future article:

  • Polynomial – Approximating a Polynomial function to a power
  • Power – Approximating a power function
  • Logarithmic – Approximating a Logarithmic line
  • Exponential – Approximating an Exponential line

trendtypes

Regression Analysis

Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modelling and analysing variables, where the focus is on the relationship between a dependent variable and one or more independent variables. Not obviously related to journalism? Think about a story about stress levels for a certain group and other factors such as wages, housing or frankly anything. If you can identify this you can then track those changes over time and tell a lot deeper data story, it gives weight to sometimes seemingly obvious answers.

So lets get started:

  1. In StatPlus click on the statistics tab.
  2. Select linear regression and click OK.
  3. Select the Y Range (A1:A8). This is the predictor variable (also called dependent variable).
  4. Select the X Range(B1:C8). These are the explanatory variables (also called independent variables).
  5. These columns must be adjacent to each other.
  6. regression1Check Labels.
  7. Select an Output Range.
  8. Check Residuals.
  9. Click OK.

Excel produces the following Summary Output:

regression2

R Square

R Square tells you how much of the change in your dependent variable can be explained by your independent variable. R Square equals 0.962, which is a very good fit. 96% of the variation in Quantity Sold is explained by the independent variables Price and Advertising. The closer to 1, the better the regression line (read on) fits the data.

Significance F and P-values

To check if your results are reliable (statistically significant), look at Significance F (0.001). If this value is less than 0.05, you’re data looks good. If Significance F is greater than 0.05, it’s probably better to stop using this set of independent variables. Delete that rerun the regression until Significance F drops below 0.05. Of course this is not guarantee of success.
Most or all P-values should be below 0.05.

Coefficients

The regression line is: y = Quantity Sold = 8536.214 -835.722 * Price + 0.592 * Advertising. In other words, for each unit increase in price, Quantity Sold decreases with 835.722 units. For each unit increase in Advertising, Quantity Sold increases with 0.592 units. This is valuable information.
You can also use these coefficients to do a forecast. For example, if price equals £4 and Advertising equals £3000, you might be able to achieve a Quantity Sold of 8536.214 -835.722 * 4 + 0.592 * 3000 = 6970.

Residuals

The residuals show you how far away the actual data points are from the predicted data points (using the equation). For example, the first data point equals 8500. Using the equation, the predicted data point equals 8536.214 -835.722 * 2 + 0.592 * 2800 = 8523.009, giving a residual of 8500 – 8523.009 = -23.009.

Why might this be useful?

See my explanation of regression analysis above! This is probably the most advanced stats covered in this post, but I would say is potentially the most useful as it can be applied to so many types of data and data sets you have created.

Conclusion:

I hope this has been a good introductory overview to statistics in Excel, I’ll do another post soon and update this one when I get round to it but I hope this has proved useful.

Video: How to use regex to scrape HTML pages

Regex how to

Ever wanted to scrape something with OutWit Hub but the data you want is tied up in ugly HTML tags that change with each new piece of information?

Regular expressions – known as regex – are often an easy way of getting around them. These are sequences of symbols and characters which express a pattern to be searched for in a piece of text.

This video will show you how to use regex sequences to scrape with OutWit Hub. It covers two individual examples for you to run through.

These will give you the basic components which you can then build on to use your own regex to help you scrape.

Interview: Ben Kreimer on drone journalism

DJI Phantom with camera

When most people think of drones, they’ll probably think of the flying machines that hail indiscriminate death down across the Middle East. But not Ben Kreimer. For him they are a way of seeing the world in a new and exciting light, and occasionally for doing a bit of accidental journalism along the way.

Even at a young age, Ben was doing things differently. He made his own toys from wood and metal, so when he was told he’d have to build his own brackets to hold cameras under drones, he leapt at the chance.

“When the drone thing came around I was getting a degree in journalism. Not because I wanted to become reporter but because I’m curious, I like exploring the world and seeing things and experiencing things.”

And he has indeed explored the world with his drones, from filming urban crocodiles in India to chasing endangered species in Tanzania and mapping landfills in Kenya. But of them all, his favourite is the Drone Safari.

“I’d never been on a safari before so being able to see the animals, to be able to film them, was really exciting.

“Doing it was fun and then people’s reactions to it afterwards. Most people have seen pictures and video of these animals, but when people see it from this perspective, flying around a giraffe’s head? They get a kick out of it. It’s the same story, but from a new perspective. That’s what I like about it.”

Drone Safari with giraffes, credit: Ben Kreimer
Drone Safari with giraffes, credit: Ben Kreimer

The challenges of drone journalism

Drone journalism is not without its own unique challenges. In 2014 the FAA (Federal Aviation Administration) told the University of Nebraska they didn’t have permission to fly drones, and had to apply for it, and was just the beginning for Ben.

“In the past year I’ve spent more time in India and Kenya than the US, and now both countries have explicitly said that civilians can’t use drones without permission from the defence branch. So that makes it hard, as I don’t want to break those regulations as a foreigner. I think the issue is foreigners coming in and flying around for fun.”

The landfill

Whilst in Kenya he used a drone to map a landfill, but that could be just the beginning. The air around the landfill is full of pollutants, enough to cause respiratory problems if you’re around it long enough.

“I was thinking of building an air pollution sensor. You could fly that around the dump, and around the area around the dump and see what we’re breathing. How’s the pollution travelling out, and can we visualise that data and show a three dimensional plume of bad air that emanates from the dump? And can you do that elsewhere?”

Ben remains adamant that the laws that are currently causing him so many problems won’t be around for very long, and in the mean time he already has ambitious plans to work with UNESCO (United Nations Educational Scientific and Cultural Organisation) and make 3D models of historical sites around the world. But, as always, his reasoning is refreshingly plain.

“It’s 2015 and why can’t we look at a three dimensional model of the Angkor Wat? I think it’s time for that. I get interested in things when I realise they’re possible. I like travelling just to go to a new place and see how things are there.

“Now’s a good time to get into the journalism part too. As far as I know there are only two universities in the US that are looking into that. But you have to go do something with it.”

6 tools for measuring social media success

Lies, damned lies and statistics – everyone knows this famously pithy quote often attributed to 19th century British prime minister Benjamin Disraeli. In the social media world though, it’s statistics that are king.

There are lots of free tools you can use to measure social networking success. Here is a list of some of the best out there to help you get a handle on just how other people and organisations are doing on Facebook and Twitter.

Twitter

TWBirthday

A free online service that will tell you when a Twitter profile was started. It is useful for analysing early Twitter activity and what events corresponded with your subject’s early Twitter use.

Twitter Birthday - twbirthday.com

 

Simply measured

A paid for service (it does, however, have a free trial available for demo) that allows users to measure Twitter account follower levels, interests and influence of an account that you do not control.

Simply Measured graph data

 

Twitter Counter

The free version of this service allows you to compare two different Twitter accounts to provide some valuable insights into an account’s activity in comparison to other similar accounts.

Twitter Counter graphs data social media

 

Facebook

Fanpage Karma

Fanpage Karma is a powerful analytical tool that allows you measure a number of key benchmarks for how effective a fan page is. These include the ability to easily measure the size of likes on a fan page against growth levels and ranked profile performance.

When combined with data that is available on fan pages themselves via Facebook Insights, it and the other tools on this list become fairly powerful for analysing page activities.

 

Fanpage Karma social media analytics

 

Simply Measured

Its free service allows you to compare one fan page to another, which can be very useful when comparing competitors on Facebook.

Simply Measured free social media analytics.

LikeAlyzer

This free tool lets you input a Facebook page URL and gives it a rating out of 100 based on a comparison of other pages. It also gives a series suggested improvements that can aid in-depth analysis of the social media strategy that the page is operating.


likealyzer social media analytics facebook

 

This is not a definitive list by any stretch, so if you have anymore tools you use and would recommend please share: