How to make a choropleth map with Google Fusion Tables

Choropleth map of cycling in Hackney wards

Data mapping is becoming an increasingly popular way of visualising information. It’s quite straightforward to make a choropleth map and in this post I’m going to show you how to make one using Google Fusion Tables.

A fortnight ago I made a choropleth map to go alongside this story for the Hackney Post.

Choropleth map of cycling in Hackney wards
Choropleth map of cycling in Hackney wards

The map was designed to show the increase in the number of Hackney residents commuting to work by cycling over the last decade. I thought I’d use a map as it is more visual and quickly tells the story. Stating that “Dalson had a rise of 10.9%, whereas Hoxton’s increase was only 5.3%, unlike Clissold…” would undeniably bore the reader.

I’m briefly going to show you how to create your own choropleth map.

The data that I used is from the Greater London Authority’s website and was published in October.

First of all I created a Google Docs spreadsheet showing the percentage increase of every Hackney ward:

Percentage point increase for cycling in Hackney by ward

Once I had manually entered these data, I put them on to a map using Google Fusion Tables. Once you load up Fusion Tables, all you need to do is import your Google Docs spreadsheet:

Importing a spreadsheet to Google Fusion Tables
Importing a spreadsheet to Google Fusion Tables

Now you need to find the map data to visualise the cycling data on a map. What you’re looking for here is a “KMZ Shapefile”. Thankfully, Ændrew Rininsland (former City Uni hack, now News Developer at The Times) uploaded a shapefile of all Hackney ward map data. If you download any shapefile of the area that you’re trying to map, that should work fine. Next, upload the .kmz file to your Google Drive.

Now, go back to the Google Fusion Table and click:

Merging in Google Fusion Tables

Select the correct .kmz file and merge it based on the field name “ward”, as this will be the same in each table.

This will show you something like this:

merged map in Google Fusion Tables

Now, to shade it in different colours dependent on intensity, go to feature map > change feature styles > buckets:

Colour shading on Google Fusion Tables

This will end up with something like this:

Unmarked choropleth map of Hackney Wards

Then all I did was add the ward names using Adobe Photoshop.

Once you get used to using Google Fusion Tables, they are a fast and easy way to visualise your data.

Top Ten Tools for Data Journalism

Data journalism is ever-evolving and so are the tools of the trade. Aside from Microsoft Excel (which I’ve blogged about here and here), these are the tools which are, in my opinion, the most useful for data journalism at the moment.

Google Fusion Tables
The gateway drug for most data journalists, Google Fusion Tables is a user-friendly mapping tool which allows users to upload their data to the application, select the columns of data they would like to map and simply create the map. It also allows users to pinpoint various areas on the map which can be interacted with to show, for example, the name of and data for that particular street when exploring the map.

Treemap
TreeMap provides an easy, yet extremely powerful means of creating beautiful treemaps for analytical and presentation purpose. Importing data from a wide variety of file formats (including Excel), as well as connecting to databases (such as MySQL and SQL Server) it’s user friendly and scales to big data.

BatchGeo
BacthGeo is free and user-friendly data-mapping software, which allows users to copy and paste data from their spreadsheets in order to map it. You can also visualise patterns in the data by location and store your data within the site with BatchGeo pro.

Tiki-Toki
Slightly more obscure than some other data journalism tools, Tiki-Toki is a good example of the timeline visualisation software out there. It comes with loads of features including date formatting options, built-in search and filtering, spacing modes and four different view types – and the finished products are visually stunning.

Datawrapper
Datawrapper is a free software tool which enables users to create charts and graphs and embed them into blog posts or elsewhere online. It can be used on their free hosted service or self-hosted and the finished charts and graphs are fully customisable in terms of fonts, colours and design. Datawrapper is easy to use, simple and the finished charts have a slick look and feel.

PopIt
PopIt is an opensource website created by MySociety, which aims to become a website of ‘components’ for open data. Taking inspiration from their previous websites such as TheyWorkForYou, PopIt’s creators state that contributors to the site will not need to know code. While they provide data from various organisations, they aim for data contributions to the site including bills, legislation and attendance records to create a fully functioning transparent-data website.

Tableau
Tableau is another, slightly more advanced data mapping tool using geocoding technology, so there is no need for latitude or longitude data. As well as built-in geocoding, it also has custom geocoding for advanced ‘geospatial analysis’, and can add added layers of data, such as demographic information, and integration with other specialised maps.

Colour Scheme Designer
When creating your data visualisations, it’s important to remember that some people are colour blind so the entire effect will be lost on them if you select the wrong colours. (The most common types of colour blindness are green/red and blue/yellow). Tools such as Colour Scheme Designer allow you to tailor your visualisations by exporting colour pallets via html, URL, CSV, text or XML and you can select specific colour pallets even according to the varying types of colour blindness.

Many Eyes
Many Eyes is a data visualisation tool which allows the advanced creation of graphs, pie charts, word clouds, treemaps and so on, but it is an opensource platform so once you’ve created something, it is archived within the site. This does mean, however, that you can explore data sets and visualisations, create a topic centre and comment on other users’ work.

Google refine
For the slightly more advanced data journalist, Google Refine is a power tool for cleaning up messy data before you can go on to visualise it. It works by being able to transform between formats, extending to other web services and can link to other databases.

 

Mapped: England’s cancer prevalence by PCT (2011)

I decided to experiment with Google Fusion Tables after my past Google Heatmap attempts, and create a map which was much more interactive.

I submitted Freedom of Information requests to the Department of Health and other governmental organisations for information on cancer rates by primary care trust (PCT) or other suitable boundary, either as a whole or per 100,000 people.

The Health and Social Care Information Centre (HSCIC) directed me to their cancer data within their data portal, where it was possible to find a breakdown of all cancers by PCT (the latest data being 2011). I uploaded the data onto a Google spreadsheet and created a Fusion Table by selecting the option from Google’s applications menu. I then selected the columns of data I wanted to map and created the map. Under Map of Col 2 > Change map styles, I was able to fiddle with the ‘buckets’ or decimal boundaries which decide the colour of the icons according to their value. I custom-selected the bucket values as the lowest number is 0.6 and the highest 2.8, so they weren’t ascending in whole numbers.

Within the map it is possible to click on the map icons and see a summary of which PCT is located, and what the percentage of cancer prevalence per PCT is. The largest icons represent the highest percentage of cancer prevalence per PCT, and then the coloured icons denote a decreased percentage of cancer prevalence per PCT by blue, red, pink and green respectively.

England cancer prevalence rates by Primary Care Trust, 2011
England cancer prevalence rates by Primary Care Trust, 2011

Link to interactive

From the map we can see that the highest cancer prevalence PCTs are East Riding of Yorkshire PCT (2.23%), East Sussex Downs and Weald PCT (2.08%), Torbay care trust (2.07%), Herefordshire (2.07%), Great Yarmouth and Waveney PCT (2.03%) and Northumberland care trust (2.01%).

The lowest cancer prevalence PCTs are eight London PCTs including Southwark (0.89%) and  Lambeth PCT (0.98%), as well as Luton PCT (1.07%), Manchester PCT (1.08%)  and Milton Keynes (1.17%).

There could be many reasons why cancer seems so prevalent in some areas and not so in others. The highest prevalence PCTs could be situated on or near industrial or waste plants, although this does not account for the higher pollution-level PCTs of big-city Manchester and London having a lower prevalence. It could be the case that those in lower-prevalence PCTs are encouraged to come forward and are diagnosed sooner, while those in higher-prevalence PCTs are not, and overall the cancer care could simply be better in London, Manchester and Luton (although reports have stated that London’s cancer care is the worst in the country). The London boroughs could also have been gearing up for the 2012 Olympics and increasing sport and health awareness.

Further research into these issues could progress this story further, but even on its own the interactive serves as an insight into cancer prevalence in England.

Who did it best? Data coverage of the 2013 local elections

Yesterday the final results were in for Thursday’s 2013 council elections, and true to form some of the biggest media publications were not content with simply stating by numbers which councils had won or lost seats (à la the less than inspiring Independent and Spectator’s election results run-downs).

Ignoring paywalled websites and those who simply don’t “do” data journalism, the three biggest sites with noteworthy election data coverage were, unsurprisingly, the Guardian, the Telegraph and the BBC. But who came out on top this time around? Here’s my breakdown…

The Guardian
Ever keen, on the day of voting the Guardian posted a graphic highlighting which councils were holding elections, which parties had control in which counties and what the estimated voting intentions were. Good to get the ball rolling, Guardian…

As soon as the election results were in the Guardian published them in table form, including which councils had which seats and which had overall control. In accordance with their open data ethos, all the data was available to download.

ss22

After the results were in, the data team used Google Fusion Tables to create an interactive map in which users can click on different county councils and see which party has overall control of which, and which councils have gained and lost seats. Users can also compare with an interactive map of the last council elections in 2009.

ss33

They also produced their own analysis of how many seats UKIP indirectly cost the Conservatives, on top of those they’d outright won.

If I’m honest, with the Guardian being the data stalwarts that they are, I was slightly disappointed that this is all they had come up with visualisation-wise . During the 2012 Presidential elections the Guardian had a field day and published some excellent data stories including a hugely detailed interactive map of presidential votes which was clickable by state, and an interactive donut graph with breakdowns of individual state seat votes for the House of Representatives.

ss44

Knowing the Guardian, however, and knowing that some data visualisations take longer to produce than others, they may well be cooking something up to release in the next couple of days…

The Telegraph
The Telegraph is rapidly carving a reputation as a serious player in the data journalism field, and I’d go as far as to say it trumps the Guardian on this occasion in its 2013 local election representation. In many respects its final results map is identical to the Guardian’s, and it looks like they’ve used the same Google Fusion tools to create it.

ss55

What I like about the Telegraph’s visualisation, however, is that I can see straight away who won what, and I like being able to see the overall country-wide winners and losers, as shown in the right hand panel.

While the scroll bar is slightly aesthetically crude, it’s useful to see in one glance which council won what, and whether its majority had changed, and individual readers may want to scroll quickly to their own council. The ‘time’ column which shows when the results came in is a nice touch and adds to the real-time reporting and live-blogging the Telegraph has become so fond of.

ss66

As with the Guardian, I’d like to see slightly more innovative data coverage of the local elections; an interesting one would be an interactive map of where the UKIP voters have come from and in which councils they have taken seats from the Conservatives.

The BBC

As I anticipated, the BBC’s election data coverage is nonpareil. I was very excited about their 2010 election coverage and knew there would be no half-hearted attempts for the 2013 local elections.

In the run-up to the elections they’d produced an exhaustive table of councils, types, boundary changes and seats, with full details below as to the differences between unitary and county councils, and guides to abbreviations.

ss77

Within the Vote 2013 subsection of the site there is a simple but effective overall results table with added detail of all individual parties.

ss88

There is an accompanying map, again with suspected use of Google Fusion Tables, but incorporating the best of the Guardian and of the Telegraph’s versions, the BBC has built the interactive so users can decide if they would like a map view or a scrollable table view of the full council listings.

ss99

Within the ‘Find a Council’ drop down menu users can select the council of their choice to examine in more detail, with graphics, tables and bar charts visualising that council’s seats and history. The map segment on the right hand side is clickable so users can ‘move’ around the country by clicking on the county next door.

ss1010

The BBC also situated their results data within their overall Vote 2013 coverage, so the data and interactive directly linked with the top stories and live blog, creating a very cohesive overall package.

ss11111

I’m admittedly biased as the Telegraph and Guardian aren’t broadcasters, but the BBC’s broadcast election coverage, especially their use of televised 3D graphics, is particularly innovative and successful in conveying the results effectively. I would suggest the BBC’s televised election results are perhaps the first uses of ‘broadcast data journalism’.

ss13

As Jeremy Vine explains in the amazing behind the scenes video How do the BBC’s on-screen graphics work?, the team uses green screen technology and special cameras to create a graphic which moves with the camera and presenter, and which the presenter can move both in front of and behind.

“Crucially”, Vine says, “it hopefully allows us to explain the politics of the night. We’re just trying to work out: ‘What are the numbers saying to us? How can we analyse them to make it clear for you?’”

Result

While its rivals usually pale in comparison, the Guardian’s data coverage of the 2013 local elections hasn’t been enormously inspiring, but, like the Telegraph, it’s done a good enough job of using maps and an interactive table to tell the story of the election results. The BBC’s data coverage, however, was slick, comprehensive and accessible.

The ballots are in and the votes have been counted: the BBC still retain their ‘overall control’ of UK politics data coverage.

[EDIT: Par about the Guardian’s UKIP analysis added at 22.04 05/05/13]

Google Fusion Tables for Dummies

When I say “dummies”, I really mean myself. Just so we’re all aware.

Anyway, for my final project I plan to make a fusion table of NIMBY requests in the UK, but before I do that, I figured that I had to practice. I’ve tried making them before with rather disastrous results, so this is my last-ditch effort to make something worthwhile before I call one of the Henrys sobbing at 4 am trying to make a GFT.

For the purposes of this post, I’m going to try mapping the places where the EDL demonstrated in 2012. This list isn’t exhaustive (let’s be real, I’m using Wikipedia for this practice run) but it gives me enough geographic diversity and a common theme to let me experiment.

I first took a table of data from Wikipedia and, after fact-checking it, put it into Google Spreadsheets. I then created a Fusion Table using said spreadsheet, and because the data was clean enough, it mapped itself fairly easily. I lacked specific postcodes for the locations of the demonstrations, but Google Fusion Tables recognised the locations well enough, and it geocoded accurately and quickly.

Screenshot from Google Fusion Tables
Screenshot from Google Fusion Tables

I initially had four bits of information for every location, including, well, location, number of demonstrators, date, and results. Because the results (arrests, etc) were not available for every demonstration, I decided to take them off the map. That was done by clicking on “map of location”

Screen Shot 2013-04-24 at 3.51.16 PM

 

and then “change info window layout”. I then simply deselected “results” and the data was a lot cleaner!

I know for a fact that professional GFTs are a lot more complex, but I’m quite happy with this one. Here’s hoping I’ll learn how to use different colours on GFTs before the final project is due!

Dog sh*t and data – playing with Google Fusion tables

I’ve heard a lot about Google Fusion Tables over the last few weeks. Simon Rogers, editor of the Guardian Datablog, must have mentioned them 10 times when he came to talk at City last month. Some of our other lecturers, including Paul Bradshaw and Gary Moskowitz have also mentioned them in class or in passing.

As a fledgling interactive journalist, they sound pretty important. Especially when I don’t really know what they are.

Being the industrious student that I am, I thought I would ignore all of the immediate work I should have been doing over the weekend, and instead have a play around.

I thought I would keep things simple and create an interactive map – the type of thing you see on the Guardian all the time.

I knew I would need some data so I decided to head over to whatdotheyknow.com to see if anyone had kindly submitted a single freedom of information request to all of London’s borough councils.

Luckily for me, a man called Hugh Roberts had recently used whatdotheyknow to FOI 165 councils across the country about dog faeces.

Success!

Hugh had asked each of the 165 councils (including the London borough ones):

    • how many complaints have you received about dog fouling from 2005 to 2012?
    • how many fixed penalty notices for dog fouling did you issue during the same period?
    • what were the dates of any dog fouling campaigns the council participated in?

I decided to discard the third question for the moment and focus on the first two.

I created a Delicious bundle consisting of every whatdotheyknow response page for each of London’s 33 boroughs.

Unfortunately, Hugh probably didn’t attend Heather Brooke’s excellent talk on the FOI act at City University last month, so he hadn’t known to ask the councils to provide the data in Excel format. This meant laboriously trawling through every response letter, deciphering the myriad of different ways that each council had chosen to display its information.

The irony that I was doing the digital equivalent of sifting through dog shit wasn’t lost on me.

I then totted up the figures and put them into a spreadsheet:

Not every council had successfully responded to Hugh, so I only had data for 26 of the 33 London boroughs. Once I had finished with the spreadsheet, I exported it as a CSV (comma separated values) file.

Next, I went to my Google Drive and clicked on Create –> More –> Fusion Table (experimental). I uploaded my CSV file, gave the table a name and a description, and clicked Finish.

The data was then imported into a Google table:

Then I went to research.google.com/tables and searched for ‘London boroughs’. I clicked on a result that looked like it would contain the borough outlines and copied the URL.

Back on my original table, I clicked on File –> Merge, and pasted the URL into the box that appeared. After clicking Next, I made sure that the ‘Borough’ categories matched up from both tables, clicked Next again, made sure every tick box was ticked, and finally clicked Merge.

Google then fuses (see where the fusion aspect comes into it? Clever stuff!) the two tables together. In the resulting window, there’s a tab called ‘Map of Outline’. Click it, and hey presto, your dog shit data should now be lovingly housed by beautiful, red borough boundaries.

Go to Tools –> Map styles to have a play with the colour and border settings.

If you can’t be bothered to make one yourself, have a look at my end result by clicking on the screenshot below (unfortunately there’s no way to embed the final product on WordPress blogs). I set up a gradient colour system so you can quickly glean a bit more information from it before interrogating the figures further.

As with all data journalism, the data is useless unless you use it to tell a story. My main aim was to learn a bit about the Google Fusion software and how to transform data from a spreadsheet into something more interesting, but I’ll follow this post up with some ideas on where I might head next if I was writing a story.