Combining D3 and Raphael to make a network graph

During the past week I have been working on a visualization for Sveriges Radio about Melodifestivalen, the Swedish qualification for the Eurovision Song Contest.

Every year there is a HUGE fuzz about this show over here in Sweden. I wanted to explore the songwriters in the competition from a dataist perspective. Who are the guys behind the scene?

If you follow Melodifestivalen a few years you will notice how many names occur year after year. By linking every songwriter to the years when they contributed I came up with this network graph.

In making this graph I managed to draw several quite interesting conclusions, for example that there are by far more men than women among the songwriters. And that there is a small elite of songwriters that does particularly well in the competition almost every year.

But this is not what I wanted to blog about today, but rather about the making of this visualization.

D3+Raphael=true

I have really come to like the Raphael.js library, but unfortunately it does not provide the same robust support for advanced data visualizations (for example network graphs) as its big brother D3.js. D3 on the other hand lacks Raphael’s broad browser compability, which is important when you are working with a public broadcaster like Sveriges Radio. So what if you could combine the two?

D3 has a really powerful library for making network graphs, or force-directed layouts. I used this library to make the foundation for the graph (take a look at the draft here). I won’t go into details about the code. The bulk is borrowed from this Stack Overflow thread.

The problem with force-directed layouts in D3 is that they quickly tend to become very burdensome for the browser. The user will have to wait for the graph to equilibrate. And that can take some time if you have 100+ nodes. But since I in this case only needed  a static layout I might as well have the computer do all those calculations in advance.

This is the idea: Raphael doesn’t have a built-in way to draw force-directed layouts, instead I take the svg-output from D3 and continue building my visualization (interactivity etc.) on top of that in Raphael. In brief, this is how I went about:

  • I started by copying the svg code from in Firebug (inspect the element and click Copy SVG) and pasted it into an empty document and saved it as an xml-file.
  • Iterated the nodes (circles) in the file and extracted the coordinates (cx,cy). I did this in Ruby using the Hpricot gem.
  • Saved the coordinates and the radius as Javascript objects: id:{ cx: 12.34, cy: 43.21, r: 5}
  • Here is the simple piece of code:
    doc = Hpricot(open("mf-graph.svg"))
     doc.search("//circle").each do |node|
       x = (node.attributes["cx"].to_f*100).round.to_f / 100 # I round the nodes to two decimals to reduce the size of the file.
       y = (node.attributes["cy"].to_f*100).round.to_f / 100
       r = (node.attributes["r"].to_f*100).round.to_f / 100
       id = node.attributes["id"]
       puts "#{id}: {x: #{x}, y: #{y}, r: #{r} },"
     end

With the coordinates of the nodes in hand it was easy to rebuild the graph in Raphael. This way I managed to vastly reduce the loading time and make it more cross-browser friendly. Here is the result once again:

Advertisements

One month Wall Street occupation mapped

For a month now we have been getting news about the Occupy movement that started on Wall Street in the beginning of October. There has been some arguing about the size of this movement. Guardian has made and interesting attempt to answer the question using crowdsourcing. I took a different approach.

The protest are coordinated at the site meetup.com. Here you find a complete list of the 2 506 occupy communities. I wrote a Ruby scraper that goes through this list and gathers information about all the meetups that has been arranged so far (more than 4 000 in a month).

I used the D3.js library to visualize the the list of meetups. This is the result (opens in new window):

The movement clearly peaked on Octboer 15th with meetups in around 600 different locations around the world. Protestors have continued to rally on Saturdays, but not with the same intensity.

Note that there is a number of protests that are missing here. I had some technical difficulties geocoding special characters (using the Yahoo Place Finder API), but that should not distort the picture of how the movement has developed. I didn’t have time to resolve the problem at the moment, but if someone knows how to get the API to understand odd characters such as ä, é and ü I’d appreciate the assistance.


Animation: World terrorism 2004-2011

After the terror attacks of nine-eleven the USA set out to fight terrorism. It has been a succesful quest in the sense that the Americans themselves have not been hit by terrorist since – but others have. According to statistics from the American Worldwide Incident Tracking System 37,798 lethal attacks have been carried out since 2004 killing 174,547. That’s a lot of nine-elevens.

Since the WITS provides such easily accessible data it would be a shame not to do something with it. So I did and this is what I ended up with (click to open in new window):

A few words about how I did this visualization.

The data

The basic data was really easy to gather here. I just filtered the attacks with ten or more casualties and downloaded the spreadsheet from WITS. The challenge was to geocode the places. I hadn’t done this before.

I wrote a Ruby script that called the Yahoo Place Finder API to transform the place names to longitudes and latitudes. For some reason a few locations got completely wrong coordinates (I started to wonder when the USA was suddenly hit by major attacks that I had never heard of). These were filtered away.

The visualization

This job provided two new challenges. One, working with dates. Two, working with maps. Just as the last time I used the JavaScript library d3.js to put the visualization together.

For the map I used the provided Albers example as a base script. With some assistance from this thread on Google groups I managed to figure out how to make a map in d3 (my heureka moment was when I realized that you can modify d3.geo.js to center the world map wherever you want).

Getting a hold of the dates in JavaScript became much easier with the date.js library. Highly recommended.

Final thoughts

A lot could have been done to polish the animation. One could have added some sort of timeline with key events, graphs and so on. But I think this is a pretty neat base for visualizing, lets says, earthquakes of other catastrophes. And you gotta like a viz on black.


Interactive: Athletics world record progression

The IAAF athletics world championships just came to an end with the one and only world record set by Jamaica in the short relay. This (the lack of world records) comes as no surprise. It is getting harder and harder to beat the old records, as the graph below shows.

Number of new world records per year.

More than 2 000 official IAAF world records have been set since the beginning of the 20th century. In other words:  a very interesting set of data. Inspired by this visualization by The New York Times from 2008 I decided to do my own mashup with this data. This is the result (click to open in new window):

Interactive visualization: click to open in new window.

The data

There were two challenges with this visualization: getting the data and visualizing it. It was surprisingly difficult to find world record data in an accessible format. Wikipedia provides some help, but the data contains plenty of holes. Instead I had to turn to the only thing the IAAF has to offer: a 700 page pdf with all the athletics statistics you can think of. The open data gospel has apparently not reached IAAF quite yet.

On the other hand this was an opportunity to practice some Excel formatting skills. To copy-paste the data into Excel was easy, transforming into readable columns and rows took some time. But I did it and you’ll find the result in Google Docs. I didn’t figure out how to make Google Docs format seconds, tenths and hundredths correctly, but if you open the spreadsheet in Excel you should be able to get the correct times.

With the data in a pretty spreadsheet I indexed all the results with 1951 as a base year (or the first recorded record for new events) and manually added the newest records, such as the one set by the Jamaican relay team.

The visualization

For the first time I used the JavaScript library d3.js for a visualization. With my short Protovis background d3.js was a charm to work with. The main advantages with d3.js compared to Protovis are that d3.js provides much greater animation support and makes it easier to interact with other elements on the page (such as div-tags).

As a d3-n00b I used Jan Willem Tulps tutorial as a base script and built around that. The d3.js documentation is still not conclusive, so for a beginner it takes some trial and error to progress, but undoubtedly this is a very powerful library for making handmade interactive visualizations.

All in all a very educative process and a result that I’m quite content with.

Post scriptum

Do you, by the way, know which the sixth greatest athletics nation of all time is (measured in number of world records)? FINLAND! A bit hard to believe a year like this when non of our athletes made the top-eight.

Country Number of records
USA 367
Soviet union 199
East Germany 109
Great Britain 55
Germany 51
Finland 49
Poland 47
Australia 41
West Germany 39
Russia 36