Combining D3 and Raphael to make a network graph

During the past week I have been working on a visualization for Sveriges Radio about Melodifestivalen, the Swedish qualification for the Eurovision Song Contest.

Every year there is a HUGE fuzz about this show over here in Sweden. I wanted to explore the songwriters in the competition from a dataist perspective. Who are the guys behind the scene?

If you follow Melodifestivalen a few years you will notice how many names occur year after year. By linking every songwriter to the years when they contributed I came up with this network graph.

In making this graph I managed to draw several quite interesting conclusions, for example that there are by far more men than women among the songwriters. And that there is a small elite of songwriters that does particularly well in the competition almost every year.

But this is not what I wanted to blog about today, but rather about the making of this visualization.


I have really come to like the Raphael.js library, but unfortunately it does not provide the same robust support for advanced data visualizations (for example network graphs) as its big brother D3.js. D3 on the other hand lacks Raphael’s broad browser compability, which is important when you are working with a public broadcaster like Sveriges Radio. So what if you could combine the two?

D3 has a really powerful library for making network graphs, or force-directed layouts. I used this library to make the foundation for the graph (take a look at the draft here). I won’t go into details about the code. The bulk is borrowed from this Stack Overflow thread.

The problem with force-directed layouts in D3 is that they quickly tend to become very burdensome for the browser. The user will have to wait for the graph to equilibrate. And that can take some time if you have 100+ nodes. But since I in this case only needed  a static layout I might as well have the computer do all those calculations in advance.

This is the idea: Raphael doesn’t have a built-in way to draw force-directed layouts, instead I take the svg-output from D3 and continue building my visualization (interactivity etc.) on top of that in Raphael. In brief, this is how I went about:

  • I started by copying the svg code from in Firebug (inspect the element and click Copy SVG) and pasted it into an empty document and saved it as an xml-file.
  • Iterated the nodes (circles) in the file and extracted the coordinates (cx,cy). I did this in Ruby using the Hpricot gem.
  • Saved the coordinates and the radius as Javascript objects: id:{ cx: 12.34, cy: 43.21, r: 5}
  • Here is the simple piece of code:
    doc = Hpricot(open("mf-graph.svg"))"//circle").each do |node|
       x = (node.attributes["cx"].to_f*100).round.to_f / 100 # I round the nodes to two decimals to reduce the size of the file.
       y = (node.attributes["cy"].to_f*100).round.to_f / 100
       r = (node.attributes["r"].to_f*100).round.to_f / 100
       id = node.attributes["id"]
       puts "#{id}: {x: #{x}, y: #{y}, r: #{r} },"

With the coordinates of the nodes in hand it was easy to rebuild the graph in Raphael. This way I managed to vastly reduce the loading time and make it more cross-browser friendly. Here is the result once again:

Campaign funding times two

These two interactive visualizations has been in the drawer all summer. I made them in June already and did a small effort to get them published, but then I when that didn’t happen they were sort of forgotten about.

The starting point was the campaign funding data that was published after the parliamentary elections here in Finland. All MPs have to publicly declare all donations above 1500 euros. The data can be found here, or in a slightly refined form here (thanks Helsingin Sanomat!). Helsingin Sanomat has already provided their own visualization, check it out here.

The network

I started by approaching the data as a network using Protovis. This was the result:

Click to open in new window. Note that it takes a while to load.

I think the output is not too shabby, although the loading time here is really not acceptable. I couldn’t find a way to fasten up the rendering. The JavaScript code could also have been better, but I learned a lot in the process of putting it all together and would have been able to write a much smoother code today I think.

The explorer

The network approach above might be pretty, but not as informative as it could be. Again I used Protovis to build an interface that quickly lets you browse through all the reports.

Click to open in new window. The explorer itself is in Finnish.

I think this visualization has a lot of strengths. It is “click-less” which means you can quickly browse the candidates. Life is too short to be clicking. The loading time is also much, much shorter than in the network visualization.

Any thoughts?

The Finnish “immigration critics” blog network

In my previous post I mapped the network of anti-jihadist bloggers mentioned in the manifesto of Anders Behring Breivik. This afternoon I stumbled upon a tweet by Martti Tulenheimo requesting something similar on the Finnish blogosphere.

I had actually tried to do something like that a few days ago, but didn’t manage to write the script I wanted. The plan was to use Yahoo’s inbound link API and have it build the network automatically. But I didn’t figure out how to only include links from the main page of a site (as Analyze Banklinks lets you do). So instead I took a more manual approach.

The method

As in the previous post I used Analyze Banklinks to list incoming links. The backlink analysis was done on the following blogs:

Site Links 93 38 28 23 8 7 6 6 5 3 2 2 1

These are the blogs listed as “critical voices” on the blog of Jussi Halla-aho (or “the master” as he is refered to on the Homma discussion board) and they serve as a good starting point for this purpose.

I run all of the blogs through Analyze Backlinks to get a network of 133 blogs. I have obviously not read all of these blogs to check if it is correct to label them as immigration critics. The 133 blogs included in this network are merely to sites that link to at least one of the blogs mentioned above.

The results have to be read with some caution. I am not sure how reliable Analyze Backlinks is. Their own disclaimer warns that the results may not be accurate.

The results

Again I used Gephi to draw the network and this is what came out (click to open as pdf):

Click to open as pdf.

The size of the sites are determined by the number of inlinks, that is the number of sites that link to the page (not only front page links are counted here). A large number of inlinks indicate that the site is popular. Hence the big dots should be seen as key nodes in the Finnish immigration critic blogosphere.

However, I am not quite sure about the quality of the inlink count that Analyze Backlinks provide. I had a quick look at what numbers that Yahoo’s backlink API throws out and there seems to be a significant discrepancy.

So this analysis is far from perfect, but it’s a start and it gives you a decent idea of what the most important sites are in blogoshere of Finnish immigration critics. If you have thoughts on how the methodology could be improved I would love to hear your comments.

Mapping the anti-jihadist blogosphere

It has been a truly sad weekend with the terrorist attack of Anders Behring Breivik in Oslo, Norway. An extremely important political discussion is now unfolding about the causes of this tragedy. What was for example the role of the expanding anti-jihadist blogosphere?

In his manifesto Breivik keeps referring to five anti-islamic blogs:

All of these bloggers have condemned Breivik and adopted a martyr position after severe criticism from all over the world. But as for example Bjørn Stærk points out the dangerous potential of these bloggers have to be taken seriously.

I wanted to look closer at this network of bloggers. In an initial attempt to map the anti-jihadist I gathered the backlinks (the incoming links) of the five blogs mentioned above (using this site). I only included links from the front page, which means that links from single blog posts were excluded. The result was a list of 749 sites (not only blogs).

I run the list through Gephi and got the following network.

Click to open as pdf.

The size of the names are determined by the number of in-links. The sites in the middle are the ones that links to several of the five blogs.

Much more could be done to learn more about this network. One could for example expand the backlink dig to more blogs. Could that process somehow be automated? And could one use some sort of API to get more data on the sites that link to these five blogs? Geo data for example? Don’t hesitate to contact me if you want to contribute.

You’ll find the backlinks I’ve gathered so far on Google Docs.

Eurovision Song Contest voting data mapped

It is time for the European championship of neighbour voting again, that is the Eurovision Song Contest. I came across a great dataset last weekend with all the entries since 1998 including voting data. I wrote a Ruby script that reshaped the list of entries into nodes and links, which made it possible to construct a network analysis. With a bit of Excel magic I managed to put together an interactive Protovis visualization (opens in new window):

Click to open the interactive visualization.

A few things to note:

  • Displayed here are the links between countries that, in average, give each other the highest points.
  • I have filtered links with a count of two or less. In other words: a country must have gotten points from another country at least three times to get a link. That means you won’t find a country like Cyprus in the network.
  • You need an updated browser to view the visualization.

Want to build your own visualization?

Mapping Ratata: Who’s Hot?

I wanted to play around in Gephi a bit more after my previous post about visualizing my social network on Facebook. So for my second project I turned my eyes to Ratata, a Swedish blog community in Finland with just over 1200 bloggers. A friend of mine, Poppe (also on Ratata), has been talking about analyzing the Swedish blogosphere. I hope he doesn’t mind me “borrowing” the idea.

I have almost no prior programming experience, but for some time now I have been trying to learn more about screen scraping. Guided mostly by the Dan Nguyen’s brilliant tutorial on coding for journalist I have started to know my way around Ruby. Scraper wiki also provides good guidance for those of us who still mostly do copy-paste programming.

After two days of trial and error I managed to put together a script that extracts all the links to fellow Ratata blogs from all the 1207 blogs. That gave me a data set of almost 2000 connections (due to some technical issues I had to exclude a couple of blogs). I obviously wanted to find out who is most popular. That is, who gets the most in-links? This is the result (click for full scale pdf):

The size depends on the number of in-links. Karin, one of the founders of the blog community, is maybe not to surprisingly number one with 70 other Ratata bloggers linking to her, followed by Mysfabon (43) and Kisimyran (37).
You’ll also notice that the gap between the haves and the have-nots is big when comes to links. The core of the map is surrounded by a cloud of unconnected blogs (and shattered dreams of blogger fame perhaps?).

I’ve uploaded the Gephi file if you want to take a closer look at the dataset yourself.

Here is the complete top ten:

Blog Links 70 43 37 33 33 32 31 30 28 27

Project One: Visualizing Friendship

I looked around for tools for visualizing social networks yesterday and found two great things:

  • An application called Gephi.
  • A tutorial explaingin how to get started with Gephi.

In the tutorial Tony Hirst shows how easy it can be to visualize Facebook friendship. With a small Facebook app called netvizz you can easily download information about who knows who in your social network. You’ll get the data in a neat .gdf file that can turn in to a nice graph in a couple of seconds.

With a little bit of color my Facebook network vizualized like this (click to open full scale pdf):

The public interest in this visualization might not be enormous. However, it does say quite a bit about my life and different stages of it:

  • The big yellow bulb represents my Helsinki network, mostly friends from university.
  • The blue network are friends from my hometown, old school friends that is.
  • On the right, between the yellow and the blue network you see friends from my student nation, Vasa nation. That is friends that study in Helsinki (yellow connections), but also know people from around my home town (blue connections).
  • The green ones are people I know from sports (close to blue as that was something I did when I was younger).
  • The red network is for people I know from Åland.
  • The purple network for my Erasmus pals.

Considering that I didn’t know anything about network visualization 24 hours ago I am quite pleased with this result.