6 sites that show why data is beta

New to data journalism and keen to learn but unsure about the kind of stories you could uncover with numbers? Well worry not because the Interhacktives have collected the examples of experts in action so you don’t have to.

Here’s a roundup in no particular order of the best news sites that use data journalism and data visualisation in the UK.

 

Guardian Datablog Screen Shot 2014-11-17 at 13.46.07

 

Guardian Data Blog

Data journalism is by no means a new trend. The Guardian is cited as the first major publication to bring data journalism into digital era, with Simon Rogers launching the Datablog in 2009.

The blog covers everything from topics  currently on the news agenda to general interest.

This week saw a report on the record levels of opium harvested in Afghanistan and a visualisation about the lives and reigns of Game of Thrones Targaryen kings.

The Guardian’s Datablog is good for beginners as there tends to be a link to the source of their data on each article, enabling you to access the data and to use it for your own stories.

Amp3d graph - We're eating more chocolate than there is in the world, "Predicted world chocolate deficit"

Ampp3d

This arm of the The Mirror is what its creator Martin Belam calls “socially shareable data journalism”, the successor to his Buzzfeed -esque site UsVSTh3m. Launched last Christmas, after only eight weeks of building, Ampp3d is the tabloid perspective of data journalism.

Stories this week included what makes the Downton Abbey’s perfect episode and the British city where people are most likely to have affairs.

Most importantly, perhaps, is that it’s a site specifically designed for viewing and sharing on a mobile device. As Belam writes on his blog,  80+ per cent of traffic at peak commuting times comes from mobile, which the project aims to capitalise on this attention.

i100 "The list" Screen Shot 2014-11-17 at 14.30.30

i100

i100 is The Independent’s venture into shareable data journalism. It takes stories from The Independent and transforms them into visual, interactive pieces of often data journalism. It also incorporates an upvote system to put the reader in charge of the site’s top stories.

The articles are easily shareable since social media integration is a core part of the reader’s experience.

To upvote an article, you have to log in with one of your social networks (currently Facebook, Twitter, Google Plus, Linkedin, Instagram or Yahoo).

Bureau of Investigative Journalism homepage

Bureau of Investigative Journalism

Championing journalism of a philanthropic kind, the data journalism of the Bureau of Investigative Journalism differs from most of the other publications on this list.

Based at City University London, its focus is not on the visual presentation of data, but the producing of “indepth journalism” and investigations that aim to “educate the public about the abuses of power and the undermining of democratic processes as a result of failures by those in power”. As a result, there is little visualisation and mostly straight reporting.

For data journalists, though, its ‘Get the Data’ pieces are indispensable resources as they allow you to download the relevant Google spreadsheets that you could then turn into data visualisations.

FT Datawatch: the world's stateless people screenshot

The FT

The Financial Times’  Data blog is one of the leading international news sources for data journalism and one of the UK’s leading innovators in data visualisation. It creates pieces of interactive and data-driven journalism based on issues and stories around the world, which include everything from an interactive map showing Isis’ advances in Iraq to UK armed forces’ deaths since World War II.

It describes itself as a “collaborative effort” from journalists from inside the FT, occasionally accepting guest blogs.

Bloomberg screenshot of homepage

Bloomberg

Bloomberg  has perhaps some of the most impressive-looking data visualisations out of all the news sources mentioned. The emphasis on the aesthetic is immediately apparent since a zoomed-in version of each visualisation functions to draw a reader in on the homepage as opposed to a traditional headline/photo set up.

Interactivity is the most defining feature of Bloomberg’s data journalism. Many of its pieces rely on the reader to actively click on parts of the visualisation in order to reveal specific data. For example, its World Cup Predictions and Results article requires the reader to select a game in order to see statistics and information about it.

What’s on Reddit’s front page?

Reddit is an online super-community with hundreds of millions of users, and has become in recent years an arbiter of what’s cool and what’s not on the web. If something makes it to the front page of reddit, where it is most visible, it will inevitably receive millions of views.

Reddit stats

The way the site works is users post content – pictures, article links, conversation starters etc – and the success of that content is determined by whether the reddit community likes it (upvotes) or talks about it (comments) or just clicks on it.

Submissions are made to the relevant subreddit – a subject specific community – and should they prove popular, can rise to the front page. This is the reddit mainstream. And I scraped it.

Digg vs Reddit via Quantcast
Digg vs Reddit via Quantcast

Three times a day, for two weeks, in March and April of this year, I scraped the data from front page of r/all to see what is popular on reddit, and what that means.

Reddit is growing. It’s the 58th most visited site on the net (up 6 places from last quarter), and the 21st most popular in the US. Since it defeated Digg at the turn of the decade, reddit has established itself as really the only aggregate site in town – and with that comes power.

If reddit helps shape the internet conversation, what does the data say about reddit?

reddittop25 This is the top 25 subreddits over that fortnight of scrapeage – the front page of subs, if you will.

Perhaps predictably, r/funny is at the top. It appeared the most on the front page, received the most upvotes, and the second most comments because, naturally, it has the most subscribers (over 6 million).

Other predictably popular subs include memes (#2), cute animal pics (#4) and video games (#5).

Interestingly, a few of the more stereotypically reddit subs barely made the front page, or didn’t even at all. The site is known for its militant atheism, and yet that subreddit only made it to #25. While the site’s marijuana predilection could only reach #26 – no place with the best of the best.

Only two of the top-25 are substantially NSFW (Not Safe For Work). The sub r/WTF – wherein people post strange and disturbing things – is about a third NSFW whereas r/gonewild, the site’s most popular porn sub, is exclusively not for the workplace (unless you work from home).

The rankings largely stay the same when using comments instead of upvotes as the key parameter, except there is a notable rise of interaction-led subs like r/askreddit and r/IAmA. Askreddit, in particular, skyrockets to the top of the front page despite only appearing 9 times over the two weeks to r/funny’s 226.

As for the average scores and comments for front-page posts, r/pics and r/askreddit are respectively the top dogs. Where r/funny rules in front page appearances and accumulated points, it doesn’t even reach the top 10 in either category. That suggests that reddit’s biggest sub is more quantity than quality.

There is an obvious outlier amongst these broad and mainstream subs and that is r/leagueoflegends.

It’s a community dedicated to an exceedingly popular 2012 PC game. With almost 500,000 subscribers, it is the 41st largest subreddit but its community activity exceeds even that.

Stats for r/leagueoflegends
Stats for r/leagueoflegends

One of the moderators of r/league of legends, arya, said: “This subreddit is the largest unofficial community for LoL. We get between 500-1000 new subscribers per day I’d estimate. Big events do show an influx of new users and higher activities. I remember during Worlds when the stream shut down due to technical errors, the thread about it reached the top of r/all within minutes.

KingKrapp, another mod, said: “From what we’ve experienced, a lot of our users only come here and don’t really interact with the rest of reddit. We’re a very specific community compared to other big subs.”

It’s the success of niche-y subs like r/leagueoflegends that prompted reddit to introduce trending subreddits at the top of the front page in April.

Umbrae, mod for trendingsubreddits, said: “The thinking behind trending was essentially that there’s a lot of diversity to reddit, but that many of the visitors to the homepage don’t see or understand that. This gives a good hint to the breadth of reddit, while at the same time giving deeply engaged folks a new source of interesting communities.”

The initiative has so far been a success, with Umbrae reporting: “A lot of smaller subs have definitely gotten exposure.”

 

alluvial2

 

Only 20% of top subreddits are not and have never been default to new subscribers. Default subreddits have more subscribers (naturally) and more interaction, but they consequently have less community.

At the beginning of May, r/mildlyinteresting became a default sub. Its popularity, according to mod RedSquaree, is because “all the content is original, and chances are that nobody has seen anything posted here before. It also doesn’t aim to be amazing content, so expectations are low and people are happy.”

mildly interesting stats
Stats from r/mildlyinteresting

Of its new status, RedSquaree said: “Our growth was very steady until the recent increase as a result of being a default. [It has led to] more removals and a deteriorating comments section.”

It seems that a sizeable sub comes at the expense of a close community. Karmanaut, mod of r/IAmA, said: “Unfortunately, there isn’t a very strong r/IAma community. I think one of the main reasons behind this is that there is no core of submitters, because there are very few people with multiple submissions. Unlike most other subreddits, all of r/IAmA is original content and has to be done by the original person. And each person has a limited involvement. In its infancy, there was a smaller group of individuals who were very involved in the subreddit but since growing to its larger size, those individuals are no longer necessary to recruit AMA subjects.”

So those are the communities, but what do the actual posts say?

Wordcloud

These are the most frequently used words in that two-week period. You can see where the interests of the site lie – there’s an inordinate number of mentions of Oculus, the VR company Facebook bought, compared to the MH370 drama.

Here’s the most popular post of that entire period. It may have only ended up at 4,003 karma but this post received more than 56,000 upvotes.

Screen shot 2014-05-30 at 14.40.56

Conclusion

Perhaps it is what it always was, or what it was always going to be, but reddit is largely a chill place. People go on the front page for a joke, a pretty picture, to learn a weird fact, or take part in an amusing straw poll. It’s a nice place to hang out, it isn’t challenging. Its major contribution to the internet conversation is jokes, memes and silly things that will crop up on Buzzfeed a few hours later.

With trendingsubreddits, the site is attempting to change that in a way. Not so much the pleasant interactions, but the homogenized output. Perhaps by promoting the nichier subs, the front page will change.

Because, just as Katy Perry is not an accurate reflection of modern music, neither is r/funny representative of reddit and its many weird and wonderful subs.

Interview with Abraham Thomas, co-founder and head of data at Quandl

Abraham Thomas

What is Quandl and why is it so useful for data journalists?

Quandl, at its core, is a search engine for numerical time series data.  The data we have is heavily influenced by what our users want, and as such we tend to have datasets on important or trending topics.  For example right now we just created a number of datasets encompassing all the inequality data included in Thomas Piketty’s new book “Capital in the 21st Century”.  We also have a huge number of datasets on standard reference topics: economics, financial markets, society, demography and so on.  All these datasets are easily accessible in applications for analysis or for export to graphs.  Best of all it’s all free.

Is it easy to use?

Quandl‘s mantra is to make data easy to find and easy to use. We try to do this in a number of ways.  
The first step is helping users find the data they need.  Having millions of datasets is no good unless you can find what you’re searching for. Most current search engines don’t do a very good job at pure numerical data searches. So we built our own custom search algorithm that is optimized for numerical data.  You can filter by data source, filter by data frequency, perform advanced search using Boolean queries and so on.  Of course there’s still a long way to go; and we’re constantly improving our backend algorithm to give you the data you were looking for. 
Another mechanism we use to help users find the data they need is by “browsing” our data collections.  Collections are hand-selected, curated groups of high-quality datasets on specific topics.  So instead of searching for specific datasets, users can explore in a more free-form manner via this method.
The next step is actually working with the data you’ve found.  We offer offer various options for downloading and graphing the data though the website.  Perhaps though our real strength is our API; lots of users have written their own apps and programs that use Quandl data delivered via this API.  We’ve also written (with generous contributions from our users) a number of libraries that help you get Quandl data directly into the analysis tool of your choice — R, Python, Excel, Matlab, you name it — without visiting the website or invoking the API.

Does the site provide data in a form that is easy to manipulate?

The important thing about making data easy to manipulate is understanding that different users have different needs, and we need to be able to facilitate that.  That’s why we offer all our data in multiple formats (JSON, CSV, XML), irrespective of what format the data was originally published in.   That’s also why we’ve built our API and all the tools and libraries that interface with it.  We want to make the process of taking our data and getting it into whatever tool you choose to use as frictionless as possible. 

How did you first come up with the idea? Did you spot a gap in the industry that needed to be filled?

The idea came from our founder Tammer Kamel.  Tammer was having a difficult time finding the data he required for his personal consulting business, without paying thousands of dollars to firms like Bloomberg or Reuters.  And it turns out that there are many people in similar situations.  As it currently stands (without Quandl) if you are not working for a large company with a large data budget, it is surprisingly difficult to get even simple public statistics, like the GDP of China over time, into your workflow. 

Last year you were described by journalism.co.uk as being the “YouTube of data” – do you think this is a fitting description?

It very much describes our aspiration. We would like to get to a point where some users are contributing massive amounts of data that other users are consuming.  We’re currently building the tools to enable this in a frictionless, functional manner.  (See answer 7 below)

How do you source the data you host, and how do you ensure that it is always up to date?

We source data from all over the internet and sometimes physical media as well. We have multiple scheduling and freshness checks in place to make sure everything is updating properly. 

Last year you mentioned that you are hoping to allow users to upload their own data – what are the latest developments here? What is the thinking behind this? And does this not make it difficult to ensure that all data is accurate?

Right now we are still in the testing phase of this project internally.  We’ve also slowly started inviting a few alpha-testers to try it.  We feel we have created a fairly frictionless experience getting data from Quandl, and we want to provide that same frictionless experience putting data on Quandl as well.  
There are two reasons for moving in this direction.  First, as a team there is only so much data we can add ourselves.  Secondly we cannot pretend to be experts at everything.  Here at Quandl we have a very talented group of people with varying skills and domains of expertise.  However the wealth of data out there — and knowledge of it — is so vast we could never dream of understanding it all.  Luckily our users as a whole do have this knowledge.  Right now, every dataset that is being added to Quandl has been specifically asked for by a user, and it has been this way now for months.  We are very confident that with the right tools, our users will be able to create high quality, usable datasets.  These datasets will be associated with their creator, and other users can choose to trust or distrust these creators just like they’ve chosen with Quandl as a whole.

Is there anything else similar in the field at the moment?

Yes and each has its strengths depending on what a journalist might need.  Zanran.com has crawled a huge number of PDF documents on the internet for tabular data; they have some really esoteric stuff.  Datamarket.com has great visualization tools.  Datahub.io also looks interesting to us as an open-data platform.  Exversion.com offers access control and version control for datasets which are both interesting features.  WolframAlpha.com doesn’t offer much raw data but their natural-language query system is very impressive.  So there’s lots of activity in this field right now.

Simon Rogers interview: ‘Who cares if I’m still a journalist?’

A veritable giant of data journalism, Simon Rogers launched the Guardian’s Datablog in 2011 before moving over to Twitter where he now manages the site’s vast quantities of data. We asked him about the perils of data journalism’s popularity and where it’s all headed.

Twitter has an unbelievable amount of data – what do you with it all?

It’s a lot of data — around 500 m Tweets a day. What we try to do is tell stories with it, much of which entails making it smaller and more manageable, to filter out the noise that we don’t need. People Tweet how they think and how they behave — the data can show you amazing patterns in the way we respond as humans to events as they happen. When a story breaks somewhere, or a goal is scored or a song is performed, you can discern these ripples across Twitter. It’s getting those ripples out of the data that is the challenge.

What’s the day-to-day like as data editor at Twitter?

It is such a mix and each day brings its own surprises and challenges. At one end of the spectrum I use free tools such as Datawrapper or CartoDB to make maps and charts that respond to breaking news stories or events, such as this one on the spread of Beyonce’s new album or the discussion around events in the Ukraine or the conversation around #Sochi2014. At the other end of the spectrum, I get to work with the data scientists on Twitter’s visual insights team to produce things like this interactive guide to the State of the Union speech or this photogrid of the Oscars, which is essentially a treemap with pictures. Right now we’re thinking ahead to things like the World Cup and the US Midterm Elections to answer the question: how can we use Twitter data to help tell the stories that matter?

Simon rogers twitter

Are you still a journalist?

I’ve wanted to be a journalist since the age of eight and it’s completely in my DNA. Over that time the idea of what was or wasn’t a journalist has completely changed. When I started the Datablog at the Guardian, people asked if data journalism was really journalism at all to which my response was: who cares? My feeling is that you just get on with it and let someone else worry about the definitions. My job is to tell stories and make information more accessible to people. I take Adrian Holovaty’s approach to this:

1. Who cares?

2. I hope my competitors waste their time arguing about this as long as possible.

What do you think about the Guardian’s Datablog since you left?

The Datablog was my baby and always will be special to me but I have to let it go and not interfere, so that’s what I’m going to do.

 guardian datablog

What drove you to found Datablog?

We had a lot of data that’s we’d collected to help the graphics team and we also saw there was a growing group of open data enthusiasts out there who were hungry for the raw information. So that’s how it started: as a way to get the data out there to the world and make is accessible.

Have you found there any difference in the attitudes towards or ideas about data journalism in the US and UK?

The differences in data journalism mirror the differences in reporting I would say. It’s a huge generalisation but I would say US data journalism tends to be about long investigations while a lot of the British reporting is aimed at shorter pieces answering questions. But there are exceptions on both sides. They come from different places: US data journalism is based in the investigative reporting of giants such as Philip Meyer; modern British data journalism was born out the of the open data movement and had at least as much to owe to a desire to free up public information as to big investigations.

Is data journalism ‘having a moment’ or are we in the midst of a very real paradigm shift?

It’s becoming mainstream and, just as in other areas of reporting, it is developing different strands and approaches. Partly because there are just so many stories in data now — and to get those stories journalists need skills and approaches they didn’t use before.

Facts are Sacred

Some have said that data journalism is intellectually elitist, perhaps even already out of touch. How would you respond?

I think we are really at an interesting stage. The last few months have seen a lot of reporting resources put into data journalism, certainly in the US. I think what’s happening is that it is developing different strains — in the same way as you have features and news reporting in traditional journalism. You have the ‘curious questions’ type of data journalism which focuses on asking about oddities; then there is the open data type of data journalism which is all about freeing up information. I’m not convinced that we have as a group got the balance correct between showing off how clever we are and making the data accessible and open. That last part is what I’m interested in. I don’t need to see anyone showing off.

Journalists are no longer just writers, they are designers. How important are pictures, diagrams and infographics?

I speak as someone who has just worked on this range of infographic books for children. We have visual minds and telling a story effectively with images will always have a greater impact than words on a page. Some of the most detailed journalistic work I have ever done has resulted in images and graphics as opposed to long articles.

Have you seen any recent data journalism that has particularly caught your eye? And what is it that you look for in a good article/webpage?

I love the work of the WNYC data journalism team, and La Nacion’s commitment to spreading data journalism and openness in South America is amazing and really powerful.

I love maps but there are just so many of them these days. Is data journalism becoming over-saturated?

There are a lot of maps around but it’s just one visual tool. Maybe we don’t ask enough questions about which type of visualisation is most powerful and important to complement a story or feature and a map is often easiest. But also that reflects the lack of decent tools for us to use. If I want to visualise a Twitter conversation off the shelf, that often means a map or a line chart because that is what I can do easily and quickly on my own. Part of my job is to think about new ways for us to do this in future.

Do you think data journalism runs the risk of looking at the big picture at the expense of the small one?

Not being able to see the wood for the trees? The best data journalism complements the big data picture with the individual stories and story telling that brings those numbers to life. I’ve been fortunate enough to work with amazing reporters who tell very human tales and the numbers just gain so much power from joining those two elements together.

Do you have any favourite data tools – scraping, cleaning, visualising?twitter data

My visual tools of the moment are: CartoDBDatawrapper, Illustrator and newly I love Raw (just discovered it).

Do you have any core principles when deciding how to express data?

I normally start off with some idea of what I’m trying to ask, otherwise the data is just too big to be manageable. Love that moment when you do the grunt work to clean up the data and it starts to tell you something meaningful.

Do you have any tips for aspiring data journalists?

The days when you could get a job in a newsroom just by knowing excel have probably gone or are going. Increasingly the data journalists who succeed will also be able to tell a story. The other piece of advice? Find something that needs doing in the newsroom — that no-one else wants to do — and be the very best in the world at doing it.

 

Carl Bialik interview: ‘Any data set has eureka potential’

Carl Bialik is a writer for Nate Silver‘s new website FiveThirtyEight, having recently moved from the Wall Street Journal where he started The Numbers Guy column. I ask him about the ups, downs and difficulties of being a data journalist, as well as what he thinks are the most important traits for being successful in the field.

You recently moved to FiveThirtyEight from the WSJ: do you think the two publications differ in their approach to data analysis?

With The Numbers Guy at the WSJ, my role was more about looking at other people’s data analyses, taking them apart and finding the weaknesses in them. I’m going to be doing some of that at FiveThirtyEight but will be more focussed on doing original data analysis.

When you first started at WSJ, were you a data journalist? Or was this more of an organic development?

When I started at the WSJ I don’t think I had even heard the term “data journalism”, and I wasn’t a data journalist for most of my first years there. The more specialised role came later when I started writing The Numbers Guy column. Then, when the WSJ expanded its sport coverage, I started to write much more about sports from a data point of view.

Which is your favourite sport to write about?

My favourite sport to follow is tennis, which is in some ways both my favourite and least favourite sport to write about. It’s my favourite because it’s largely untapped territory in terms of data analysis, but it’s also one of my least favourites because of the way that the data has been archived, making it one of the most difficult to get accurate data for. It’s a pretty fertile area, though, and although it’s not big in the USA, there’s always going to be a focus around major events.

What steps do you take to make sure that the data you are analysing is accurate?

There are some built-in error checks with analysis, which can help determine the reliability of the data. These include checking whether the data you are running the analysis on makes sense, and looking whether different analyses produce similar results. Another important question to ask yourself is whether there is some important factor that you are not controlling for.

At FiveThirtyEight we also have a quantitative editor who reviews your work and points things out for you, such as confounding variables and sources of error. Readers are really vital for this, too: the feedback we have already received from readers who tell us when they think we have made mistakes has been extremely useful.

What do you think are the most important traits for being a good data journalist?

The first is having a good statistical foundation, which includes being comfortable with coding and using various types of software. The others are the same as for all types of journalist: being a collaborator, fair, open-minded, ethical, and responsive to both readers and sources.

Which data journalists do you particularly admire?

I’ve admired the work of many data journalists, including my current colleagues, and my former colleagues  at the Wall Street Journal. Certainly Nate Silver at FiveThirtyEight: he is a large part of the reason that I wanted to work with FiveThirtyEight in the first place. Also my colleague Mona Chalabi because she has a great eye for finding stories with interesting data behind them.

What’s the best part of being a data journalist?

Compared to most journalism, I think there is more potential to have an “aha” [eureka] moment for any given story, since it can sometimes be a slog if you’re trying to get that just from interviews or other sources. Any data set has the potential to give you a couple of these moments if you’re spending just a few hours looking at it.

And the most difficult part?

I think number one is when you can’t get hold of the data for something: occasionally a topic can be very hard to measure, and you would love to write about it but just don’t have a way in. This is often the case with sport in particular, where there can be measurement problems, issues with the quality of the data, or even a complete scarcity of it. So issues with data quality and access are the most difficult parts.

 

The 2014 budget highlighted data journalism’s mobile device woes

Data Coverage of the Budget 2014 - Telegraph Chart
Data Coverage of the Budget 2014 - George Osborne
Image: 38 Degrees

Data journalism is in vogue these days so what better time to draw up a graph than at budget time, when communicating lots of numbers efficiently is the top priority? The 2014 Budget saw some great data coverage across the board, but it also showed that one of data journalism’s biggest challenges was finding a format that works well on mobile devices. In this post I’ll take you through some of the stuff that worked really well on mobile and other stuff that didn’t translate from desktop.

Why is it important that data journalism works on mobile?

At the Digital Media Strategies 2014 conference earlier this month, Douglas McCabe of research firm Enders Analysis said that the time people spend on the internet on mobile devices will overtake the time they spend online on a desktop by next year.

If you have a blog, you only need to take a look at your analytics to see how much of your traffic comes from mobile devices. If you haven’t already done so, it will be a lot. It is, therefore, pretty important that your content works well on mobile and that carefully crafted visualisations, designed to make visitors invest some time on your site, don’t leave your readers putting down their phones in frustration.

Try viewing this on mobile if you want to experience what I mean.

2014 budget coverage – the Telegraph

I’m kicking off with the Telegraph‘s coverage because it was probably one of the best for working on mobile devices. (All the screenshots in this article were taken from my iPhone 5, so you would expect that it would be able to handle most things.)

Data Coverage of the Budget 2014 - Telegraph Chart
The Telegraph’s data coverage of the 2014 budget with their chart-builder

 

Rather than attempt to embed their charts in the body of their article, the Telegraph programmed this chart viewer using their in-house chart building system and then linked to it from the body of their article. As you can see, it works really well. You can easily have the chart and the accompanying text side by side whilst being able to comfortably read both. It is also interactive and gives you the option of clicking onto the next chart.

This is all very well, but what if you don’t have the time, resources or inclination to build your own in-house chart system? 

The Guardian used Datawrapper to mixed effect on mobile

The Guardian’s data blog is a hotbed of interesting visualisations but for budget day they decided to keep it simple. They used what looks like customised versions of Datawrapper charts to display Osborne’s budget. Datawrapper is really responsive and should theoretically work really well on mobile. So on a day when a lot more people than normal are likely to be reading the data blog it makes sense to keep things simple rather than going for a more detailed graphic.

Data Coverage of the Budget 2014 - Guardian Unclear Line Chart
Budget coverage on the Guardian’s data blog

In reality, though there was a slight problem. This is what one of the line charts looked like:

The line of the graph itself showed up fine but the axes didn’t show up on the portrait version of my phone because they were too wide to fit on. Looking at it from this view, the chart isn’t very informative.

This problem was solved when turning the phone to a landscape view and this may seem like a pedantic point to highlight. However, the Guardian were relying on people realising that they needed to tilt their phones when reading the article and could well have confused those who didn’t realise this was needed. Why alienate a part of your audience, however small, when it could be accessible to them all?

When the Guardian’s charts worked well, however, they were probably the most interesting in terms of the story that they were telling. This bar chart showing that since 2010, Osborne’s budgets haven’t been particularly harsh or eventful was something that hadn’t been visualised anywhere else.

Data Coverage of the Budget 2014 - Guardian Good Bar
Bar chart from the Guardian’s data blog

The Daily Mail tried hard with a 3D pie chart

The Daily Mail obviously tried to take all this into account by playing it pretty safe with their data coverage. Although not extensive, it did extend to this non-interactive gem of a pie chart:

Data Coverage of the Budget 2014 - Mail 3D Pie chart
The Mail commit a cardinal sin with a 3D pie chart

For the purposes of this article, the Mail‘s chart succeeded because it could be read well on mobile. However, in terms of being an effective visualisation it fails miserably, committing a cardinal sin of data journalism. 3D pie charts may look flashy but the very nature of that third dimension skews how big the segments look to the naked eye. In this case the national insurance segment is actually smaller than the ‘other’ segment’ but it would be difficult to tell this by looking at the graph.

Ampp3d’s 2014 budget coverage was designed for mobile

Data Coverage of the Budget 2014 - Ampp3d Bar 2
Ampp3d’s data coverage words really well on mobile

Ampp3d is a relatively new website set up by Trinity Mirror with the remit to create socially shareable data journalism. They run their site on Tumblr and as such it is really responsive to different formats. Ampp3d was basically set up to compare favourably in a piece such as the one I am writing. And, it does.

They, like the Guardian, used Datawrapper to communicate different aspects of the budget. However, because Tumblr is more responsive than the Guardian’s site, the charts’ axes were still visible when the phone was held in portrait mode. This meant that whichever way you looked at it, it was easy for a reader to read the bar chart and subsequently understand the story.

Visualisations will adapt to mobile but we have to adapt as well

None of the visualisations discussed in this post were terrible. There were no attempts at the type of elaborate map that is impossible to read on mobile.  Some were really good and most had only minor flaws. But when trying to persuade somebody to spend time on your site, those minor flaws can be the difference between them staying or bouncing.

Visualisation software will no doubt improve in the future and render many of these problems irrelevant. Until that happens, however, data journalists have to take the limitations of mobile into account, even if it means sacrificing an impressive Tableau for a simple table.

The News Hub: a bridge between social media and journalism?

The News Hub logo

With the grand ambition of “making news better” in a digital age, The News Hub aims to right some of the wrongs of the online publishing world. Delivering a powerful, democratic and free market news platform, as its founder William Stolerman hopes to do, sounds easy enough. However in an industry wrestling with oversaturation, paywalls and revenue models, can this new project make it? We talk to William Stolerman about his plans for The News Hub, expected to launch in the next couple of weeks.

Like many of his professional counterparts, William Stolerman has watched on as his industry has struggled to come to terms with change.

Continue reading “The News Hub: a bridge between social media and journalism?”

Interview with David Dubas-Fisher, Sports Data Journalist at Trinity Mirror

sports data journalism

David Dubas-Fisher is a Sports Data Journalist at Trinity Mirror – he talks to Interhacktives about how for sports journalism,  it’s particularly important to develop data stories that are not overflowing with statistical analysis as for the reader this will detract from the bigger story.

Continue reading “Interview with David Dubas-Fisher, Sports Data Journalist at Trinity Mirror”

Two hours of Interhacktivity: #hackshangout

Interhacktivity tutorial #hackshangout

Interhacktivity tutorial #hackshangout

Our tutorial on data journalism will start at 6pm today (Monday 24th March) – click here for the link.

 

To see last week’s tutorial on social media verification, click here.

 

The time for our first hour of Interhacktivity is almost upon us.

The tutorials will be held via a Google Hangout on Air. The exact link to the Hangouts will be posted at the top of this article when they go live, on Monday and Thursday (at 6pm).

Throughout the tutorials, non-presenting Interhacktives will be monitoring Twitter. We’re hoping to keep matters as informal as possible, so, if you have any questions during the event, please tweet using the hashtag #hackshangout. And, of course, if you have any questions, suggestions or comments before or after the event, please do the same, or tweet us directly.

Two hours of Interhacktivity #hackshangout

 

After taking into account the results of our poll, the topics were decided as follows:

Data tutorial (Monday 24th March, 6pm)

– Data cleaning and mapping with Daniele Palumbo

– Data visualisations (Datawrapper and Raw) with Laura Cantadori

Social media tutorial (Thursday 20th March, 6pm)

– Social media verification with Rachel Banning-Lover and Chris Sutcliffe

If you are curious and feel in need of some guidance on how to fit into a modern newsroom, join us on Monday and Thursday.

For more details of the thinking behind the event, click here.

How to extract data from a PDF

We live in a world where PDF is king. Perhaps we could even go as far as to call it the tyranny of the PDF.

Developed in the early 90s as a way to share documents among computers running incompatible software, the Portable Document Format (PDF) offers a consistent appearance on all devices, ensuring content control and making it difficult for others to copy the information contained within.

Continue reading “How to extract data from a PDF”