How open data improved election coverage in Finland

This is a guest post I’ve written at the Open Knowledge Foundation Blog.

Parliamentary elections in Finland are usually rather dull. Rarely does the rest of the world bother to pay any attention. But this year was different. The elections in April were the most exciting ones in decades with the incredible rise (from 5 to 19 percent) of the populist party True Finns as the main attraction. But the intricate political puzzle that followed the success of the True Finns was not the only source of excitement in the elections. Especially not if you happen to be an open data enthusiast.

Since the mid-90s so called voter advice applications have played an increasingly important role in the Finnish elections. Voter advice applications are questionnaires about political issues put together by media outlets and NGOs for candidates to answer. Voters can then see which candidate match their own views best.

The by-product of these applications is a very interesting set of data. Here you got all the opinions of (almost) all the candidates gathered in an easily accesible format. I don’t know about you, but this surely gets me going.

Up until now this data has been a completely vested resource. The news rooms have kept it to themselves and not managed to take the analysis past the level of “lets see what the candidates think about nuclear power”. But this year things changed. The leading newspaper Helsingin Sanomat decided to publish their data openly a week before the elections. And within a couple of days The Crowd (bloggers, programmers etc.) managed to do more with the data than journalist had done in fifteen years:

  • Kansan Muisti (“the memory of the people”), a site resembling It’s Your Parliament, used the data to investigate if the MPs had voted in accordance with the promises made in the voter advice applications before the elections.

A few weeks after the election the public broadcaster YLE followed the example of Helsingin Sanomat and published their data as well.

Journalism is changing. The immediate reaction of a traditional journalist is often resistance when someone asks a newsroom to publicly share data. “Someone might steal a story that we haven’t yet done!!!” argues the Journalist 1.0.

But it is time to realize that journalistic output does not only have to be 700 word stories, neatly structured with headings, preamble and text. Journalism can also be publishing a set of data that will be refined and possibly developed into new stories by readers. As the case of Finland shows, there is still a great amount of unused journalistic potential in The Crowd.

The opinions of our new parliament – in 30 seconds

Yesterdays elections turned out to be even more exciting than everyone had expected. True Finns shocked everyone with their third position. Trying to get a government together now is not easy.

But what opinions do the new Finnish parliament represent? I’ve put together an interactive tool that lets you explore the opinions of the new MP’s – in about 30 seconds. It is based on the answers given in Yle’s vaalikone. I had to put it on a different server because this WordPress blog doesn’t support own Javascript adventures. Open it here.

Application opens in new window. Requires an updated web browser.

This was my first real visualization in the Javascript based Protovis. I strongly recommend it so far. I hardly have any prior Javascript experience, but with assistance from the tutorials at Knight Digital Media Center I have been able to figure it out quite quickly. I especially appreciate the possibilities to create interactivity. Made me realized what a waste of time mouse-clicking is (compared to mouseover interaction).

  • Get the data. (Note that the answers might be slightly outdated, I noted in the reporting from Yle that a few new candidates seem to have participated since I did the scrape. However, the big picture should be accurate.)

Yle vaalikone data offline for now

I’ve looked a bit closer at the Finnish copyright laws and come to the conclusion that I might have been a bit overenthusiastic publishing the raw data from the vaalikone of Yle. I scraped and published this data thinking it would merely be a mashup of public data and therefore a legal thing to do. However, I did not consider the 49 § of the copyright law which states that the producer of a catalogue or database including a “great amount of information” owns the exclusive right to distribute the catalogue (or an “essential” part of it). I have not run this by any legal expert, but my own conclusion is that:

  • Gathering the data should not be a problem. In other words, anyone can download my Ruby script (or this slightly more professional version by Anon) and scrape the data to ones own computer.
  • Publishing this data without the permission of Yle might be a legal problem. Or could one argue that the vaalikone data is actually not a catalogue or database, but rather a large number of “quotes” from candidates? After all the answers given by the candidates are not visually published as databases.
  • Publishing a mashup (a visualization for example) should not be a problem as it does not mean that the user gets a hold of the raw data.

I have contacted Yle to ask for their permission to publish the data again. Their response was that they will consider it within a couple of weeks.

If anyone holds any expertise in these questions I would love to hear your input.

Does age matter in politics?

I did a small analysis on the Yle vaalikone data tonight. I wanted to find out if young and old politicians differ in their opinions. Can I, as a fairly young voter, expect to find my political soul mate among my peers?

Well, that seems to depend on what questions are important to you. By calculating the correlation (I just used the correl-function in Excel) between the age of the candidate and the answers given in the voters advice application you get an idea of how well political opinions and age are linked together. The bigger the correlation, the wider the gap between young and old:

Correlation between age political opinions

The strongest correlations here are between 0,2 and 0,3. What does this mean in practice? Lets look at one of the questions more closely. The biggest difference between old and young candidates appears on the question about whether registered homosexual couples should have the same rights heterosexual couples. I did a visualization in Many Eyes to illustrate the difference: (click to open interactive full size version in Many Eyes)

Should homosexual couples have same rights as hetero couples?

So, do you think gay rights are important? Start looking among the young candidates for your pick. Is Nato, the length of the work day or the right to strike the most burning questions? In that case age is really just a number.

Yle vaalikone data updated

I did a new scrape of the Yle vaalikone data yesterday. About 300 new candidates have answered, making it 1801 people all together.

  • The answers of all candidates in Yle vaalikone (Google Docs) (offline for now)

This time I combined it with candidate data from (thank you Google Fusion Tables). Note that questions 31-33 are specific for each district and not comparable.

//EDIT: I found some errors in the previous dataset (or rather Verkostoanatomia). I did the scrape again, but skipped the fusion part. I hope this is correct now.

If you want to see how the scrape was done I’ll also leave you with the Ruby file. Don’t hesitate to comment on my coding if you have any suggestions about improvements (I am sure there are many).

//UPDATE: The dataset has been taken offline. This is why. Hope to be able to put it back soon.

Vaalikone of Yle scraped and ready to download

The public broadcaster Yle published its voting advice application (vaalikone) last week (in Finnish and Swedish only, which is quite a shame for a public broadcaster – should we not encourage new citizens to take part in politics?). I took the chance to practice some screen scraping skills. You’ll find the result here:

1585 candidates answered the 35 questions, which means you got a pretty interesting set of data. A first analysis and visualization on one of the questions is coming up shortly.

A few remarks:

  • Questions 31-33 have been left out, because they were different in every district and therefore not comparable.
  • Question 34 is multiple choice and therefore listed in several columns.
  • Questions and answers are listed in the second sheet of the spreadsheet in Google Docs.



Edit: The dataset has been updated with a new scrape from 24.3.2011.