Pimp your Google Map

Today I’m having my first visualization published at Helsingin Sanomat. A map of all the discharges in Helsinki during one night (21-22.12) showing reported crimes and accidents. Check it out:

Go to HS.fi.

It’s a pretty basic visualizations. I got a bunch of addresses that I geocoded using the Yahoos place finder API and projected on Google Maps. I’m a huge fan of a lot of the Google tools out there. Most notably the Docs platform, which step by step is phasing out my decency on MS Office (at least MS Word).

However, I have been quite sceptical towards Google Maps. Mostly because of its – in lack of a better word – ugliness. We’ve seen enough of that pale blue-green-yellow layout.

The good news is that Google has made it possible to easily style your maps using a simple online interface. In this example I’ve just inverted the lightness, added a pink hue and reduced the saturation of the water.

The options are endless. You can easily spend hours playing around with the different settings.

Not too many developers and designers seem to have found this tool yet, but my prediction for 2012 is that there will be a lot more styled Google Maps. And why not a portal with open map skins for anyone to use? I can’t find anything like that at the moment.

Interactive: The 100 richest people in Finland

November is the big gossip fest in Finland. Every year in the beginning of the month the tax records from last year are published. In other words: you get to know who made the most money.

Every year the Finnish media outlets do a very conventional presentation of this material. Page after page of lists of top-earners. Rarely does anyone do anything more creative with the data.

I gave it a shot. This is what came out:

Open the interactive visualization in new window.


This is my first visualization in Raphael.js. Previously I have been working with D3 and Protovis, but the weak browser support of these two libraries is becoming a growing concern. Especially when one tries to do sell the work. However, I have found Raphael to be very useful and somehow more intuitive than D3.

The idea for this presentation came from the super-visualization, The Sexperience, by British Channel 4, a survey about the sex life of ordinary Brits (don’t worry, you can open it at work as well). I think the geniality behind this setup is that you can follow the respondents in the quiz from question to question, which gives the user the possibility to explore the relation between different questions instead of just looking at one question at a time. What are for example the sexual preferences of the people who lost their virginity late?

To some extent my presentation of the 100 top earners let you do the same thing. You can select the persons you are interested in and follow them through the presentation. This is a potential of the modern web that I think we will see much more of in the future.

Tutorial: How to extract street coordinates from Open Street Map geodata

I’ve spent almost a year learning about data-driven journalism and tools for analysis and visualization of data. I have now become confident enough to think that I might even be able to teach someone else something. So here it goes: my first tutorial.

The task

Earlier this fall Helsingin Sanomat published a huge dump of price data from Oikotie, a Finnish market place for apartments. I had an idea to build a kind of heat map where every street would be colored based on the average price of the apartments.

With the JavaScript library Polymaps you can easily make stylish web maps. The problem is that you need an overlay GeoJSON layer with the colored streets. Finnish authorities do not – yet! – provide open street-level geodata. Fortunately Open Street Map does.

From .shp to .geojson

The raw data from Open Street Map is downloadable in shape-format. So in my case I download the shapefile package of Finland and opened it in Quantum GIS (Layer > Add vector layer). This is what the finland_highway.shp file looks like.

This is A LOT of geodata, but in this case I’m only interested in the Helsinnki region. So I zoom in Helsinki an and select, roughly, the streets that I’m interested in using the lasso tool (select object tool ).

To export the selected part of the map to the GeoJSON format that Polymaps can read, chose Layer > Save Selection as vector file and GeoJSON as your format. Save! Done!

Filtering the GeoJSON file

We got a our GeoJSON-file. Now there is just one problem: it is huge, 18 MB! But there are a lot of streets here that we don’t need. We want to filter these streets. This will require some programming skills. I turn to Ruby.

This is the structure of an object in the GeoJSON file:

{ "type": "Feature", "properties": { "TYPE": "cycleway", "NAME": "", "ONEWAY": "", "LANES": 0.000000 }, "geometry": { "type": "LineString", "coordinates": [ [ 24.773350, 60.203288 ], [ 24.774540, 60.203008 ], [ 24.777840, 60.202300 ], [ 24.781013, 60.201565 ], [ 24.781098, 60.201546 ], [ 24.782735, 60.201199 ], [ 24.784300, 60.201045 ], [ 24.785846, 60.201085 ], [ 24.787381, 60.201133 ], [ 24.787812, 60.201169 ], [ 24.788101, 60.201207 ], [ 24.797454, 60.201623 ], [ 24.797636, 60.201620 ], [ 24.799625, 60.201405 ], [ 24.801848, 60.201089 ] ] } }

This street does apparently not have a name, but the others do, which means I can extract that streets that I’m interested in based on their name.

In another array I list the streets that I want to be included in the visualization. Like this:

streets = [
# and so on...

I now want to tell the computer to iterate through the GeoJSON file and extract the streets that are included in the streets array. Or in practice I approach it the other way around: I check what streets in the GeoJSON file that are not included in the array and remove them.

This is is the code:

def process(data)
json = JSON.parse(data)

#-- STEP 1. Go through the geojson file and add the index numbers ("i") of the street names that are not found in the array "streets" to a new array ("del")
i = 0
del = []

json["features"].each do |a|

unless $streets.include? a["properties"]["NAME"]


i += 1


#-- STEP 2: Iterate through the del array from the back and remove the streets with the corresponding index numbers in the geojson data ---
del.reverse.each do |d|



#-- Open a new json file and save the filtered geojson ---

File.open("hki.json", 'a'){ |f| f.write(JSON.generate(json))}

In this case data is the GeoJSON file and $streets the array of the selected streets. And voilà: you got yourself a new GeoJSON file. In my case I managed to shrink it down to 1.6 MB.

The visualization

I now got what I wanted in the beginning: the geographical coordinates for the streets that I want to plot, which means I’m halfway to making my visualization.

I won’t go in to details on how the actual visualization was put together. The short version is that I used this pavement quality example as base script and made some small modifications. The price data is then picked from a separate file. This is the result, the housing prices in Helsinki, street by street:

Open the full map in new window.

Not too shabby, right? I managed to sell this visualization to Hufvudstadsbladet which now runs it on their website.



One month Wall Street occupation mapped

For a month now we have been getting news about the Occupy movement that started on Wall Street in the beginning of October. There has been some arguing about the size of this movement. Guardian has made and interesting attempt to answer the question using crowdsourcing. I took a different approach.

The protest are coordinated at the site meetup.com. Here you find a complete list of the 2 506 occupy communities. I wrote a Ruby scraper that goes through this list and gathers information about all the meetups that has been arranged so far (more than 4 000 in a month).

I used the D3.js library to visualize the the list of meetups. This is the result (opens in new window):

The movement clearly peaked on Octboer 15th with meetups in around 600 different locations around the world. Protestors have continued to rally on Saturdays, but not with the same intensity.

Note that there is a number of protests that are missing here. I had some technical difficulties geocoding special characters (using the Yahoo Place Finder API), but that should not distort the picture of how the movement has developed. I didn’t have time to resolve the problem at the moment, but if someone knows how to get the API to understand odd characters such as ä, é and ü I’d appreciate the assistance.

SVT launch Guardian inspired data blog

On Thursday the Swedish public broadcaster SVT launched a new exciting platform called SVT Pejl. It describes itself as a news blog producing journalism based on stats, facts and numbers. “Our ambition is to explain current events and make numbers and facts available in an accessible way”, writes Kristofer Sjöholm who is the leader of the project.

The presentation of the blog features an interview with Simon Rogers of Guardian’s Data blog. And this is clearly where the inspiration comes from. This is the Data blog of Sweden.

If you know some Swedish it is well worth taking a look at this introductory video explaining what data-driven journalism and SVT Pejl is.

For a person like me with one foot in Sweden and one in Finland it is interesting to follow (and be part of) the development in this field right now. Helsingin Sanomat has taken a lead role in Finland publishing big open data sets and arranging several hacks and hackers style workshops. It feels like Sweden has been loosing some ground to its little brother here, but maybe this new site will narrow the gap.

Animation: World terrorism 2004-2011

After the terror attacks of nine-eleven the USA set out to fight terrorism. It has been a succesful quest in the sense that the Americans themselves have not been hit by terrorist since – but others have. According to statistics from the American Worldwide Incident Tracking System 37,798 lethal attacks have been carried out since 2004 killing 174,547. That’s a lot of nine-elevens.

Since the WITS provides such easily accessible data it would be a shame not to do something with it. So I did and this is what I ended up with (click to open in new window):

A few words about how I did this visualization.

The data

The basic data was really easy to gather here. I just filtered the attacks with ten or more casualties and downloaded the spreadsheet from WITS. The challenge was to geocode the places. I hadn’t done this before.

I wrote a Ruby script that called the Yahoo Place Finder API to transform the place names to longitudes and latitudes. For some reason a few locations got completely wrong coordinates (I started to wonder when the USA was suddenly hit by major attacks that I had never heard of). These were filtered away.

The visualization

This job provided two new challenges. One, working with dates. Two, working with maps. Just as the last time I used the JavaScript library d3.js to put the visualization together.

For the map I used the provided Albers example as a base script. With some assistance from this thread on Google groups I managed to figure out how to make a map in d3 (my heureka moment was when I realized that you can modify d3.geo.js to center the world map wherever you want).

Getting a hold of the dates in JavaScript became much easier with the date.js library. Highly recommended.

Final thoughts

A lot could have been done to polish the animation. One could have added some sort of timeline with key events, graphs and so on. But I think this is a pretty neat base for visualizing, lets says, earthquakes of other catastrophes. And you gotta like a viz on black.

Interactive: Athletics world record progression

The IAAF athletics world championships just came to an end with the one and only world record set by Jamaica in the short relay. This (the lack of world records) comes as no surprise. It is getting harder and harder to beat the old records, as the graph below shows.

Number of new world records per year.

More than 2 000 official IAAF world records have been set since the beginning of the 20th century. In other words:  a very interesting set of data. Inspired by this visualization by The New York Times from 2008 I decided to do my own mashup with this data. This is the result (click to open in new window):

Interactive visualization: click to open in new window.

The data

There were two challenges with this visualization: getting the data and visualizing it. It was surprisingly difficult to find world record data in an accessible format. Wikipedia provides some help, but the data contains plenty of holes. Instead I had to turn to the only thing the IAAF has to offer: a 700 page pdf with all the athletics statistics you can think of. The open data gospel has apparently not reached IAAF quite yet.

On the other hand this was an opportunity to practice some Excel formatting skills. To copy-paste the data into Excel was easy, transforming into readable columns and rows took some time. But I did it and you’ll find the result in Google Docs. I didn’t figure out how to make Google Docs format seconds, tenths and hundredths correctly, but if you open the spreadsheet in Excel you should be able to get the correct times.

With the data in a pretty spreadsheet I indexed all the results with 1951 as a base year (or the first recorded record for new events) and manually added the newest records, such as the one set by the Jamaican relay team.

The visualization

For the first time I used the JavaScript library d3.js for a visualization. With my short Protovis background d3.js was a charm to work with. The main advantages with d3.js compared to Protovis are that d3.js provides much greater animation support and makes it easier to interact with other elements on the page (such as div-tags).

As a d3-n00b I used Jan Willem Tulps tutorial as a base script and built around that. The d3.js documentation is still not conclusive, so for a beginner it takes some trial and error to progress, but undoubtedly this is a very powerful library for making handmade interactive visualizations.

All in all a very educative process and a result that I’m quite content with.

Post scriptum

Do you, by the way, know which the sixth greatest athletics nation of all time is (measured in number of world records)? FINLAND! A bit hard to believe a year like this when non of our athletes made the top-eight.

Country Number of records
USA 367
Soviet union 199
East Germany 109
Great Britain 55
Germany 51
Finland 49
Poland 47
Australia 41
West Germany 39
Russia 36