So, you want to be a data journalist? All well and good, but it’s important to remember to learn to crawl before you can walk, and luckily we had the Guardian’s James Ball to guide us every baby-step of the way in the form of the essential pre-guide to data journalism: Microsoft Excel.
Excel is crucial to data journalism because, guess what, you’ll be obtaining a lot of data and need spreadsheets to process it, clean it, make sense of it and then tell the story with it. Other similar tools do the same job, but because Excel is so ubiquitous I’m going to be using that over anything else.
I myself am a complete novice to Excel so this post will be of help to anyone like me who wants to start from the very beginning. Some may scoff at how basic these tips are, but I found them incredibly useful when they were explained to me, so hopefully they will be of use to somebody else.
Very, very basic Excel tips
– With your cursor, the big white ‘plus’ is select; you can hold and drag for multiple cells
– Holding down ‘ctrl’ and tapping ‘.’ [full stop] will take you to the four corners of your data
– On the edge of the cell with the cross-type cursor symbol you’ll be able to drag cells and their contents around, but be careful you don’t delete/undo what’s the in old one
– ‘ctrl’ and ‘z’ is undo; ‘ctrl’ and ‘y’ is redo; ‘ctrl’ and ‘a’ will highlight all data
– If you hit enter after inputting a cell the cell box will move down to the next one
– Double clicking between the two cells (e.g. the line between C and D column headers) will automatically fit the column to the biggest word
– If you see ‘#####’ appear in place of a figure in your cell, it means the cell isn’t wide enough to view it
– A red flag on cell can highlight a caveat or provide an ‘annotation’. To red flag, right click and then put insert comment
– You can ‘freeze’ the header columns or first row so it’s easy to scroll down and view the data with its headers. To do this go to ‘View’ and then ‘Freeze Panes’ and select ‘Freeze Top Row’ or select the column/row you’d like to freeze
On to the numbers…
– Hover over the bottom right of a cell. The tiny black cursor icon will appear, and using this you can drag it down several columns and it will copy the amount of the first cell into the others
– It will copy words if the cell contains words, but if the cell contains dates and you drag the icon down, it will input the correct dates in order
– If you were to put the 1st of the month in the first cell and 8th of the month in the second, it will drag down and automatically input weekly dates in order
– It can also do this for numbers in order, prime numbers etc. (Very cool!)
– Format the cells properly by highlighting the numbers, right clicking, clicking ‘format cells’ and then format as currency, decimal etc.
– If you click any column in Excel, at the bottom of the screen it tells you the average, count and sum of that row. But watch out for things like it adding up the total i.e. doubling your sum
Filtering and Formulae
– You can ‘filter’ your data by highlighting it all, going to ‘Data’ at the top then ‘filter’
– This puts little arrows in all columns and allows you to sort data e.g. highest to lowest, or view one country only
– This is an invaluable tool because already you will be able to start seeing ‘stories’ from the data e.g. by filtering the ‘total compensation’ column from largest to smallest, I can see that the Northern Ireland police service had by far the most amount (nearly £2m more than the Met), even though the Met had nearly four times as many claims as them
-Pressing the ‘clear’ will remove the filters
– You can also make the data work for you by asking questions of it through formulae
– So to work out how many claims per officer, you put in the G column =c2/e2 (= always means ‘sum’ i.e. an equation)
– Another invaluable formula is the ‘if’ formula. If I wanted to ask the data to tell me ‘yes’ or ‘no’ for whether each police force received more than £400,000 compensation, I’d go to a new column and type in =if(D2>400000,’yes’,’no’) and press enter
‘=if’ is the equation I’m asking the data, ‘D2’ is the column where the compensation is, ‘>’ means greater than and obviously the ‘yes’/’no’ is whether the police force did or not
– The word in the formula doesn’t have to be ‘yes or no’, it could be anything within the quotation marks
– Curled quote marks won’t work- make sure you write the formula in Excel