How open data improved election coverage in Finland

This is a guest post I’ve written at the Open Knowledge Foundation Blog.

Parliamentary elections in Finland are usually rather dull. Rarely does the rest of the world bother to pay any attention. But this year was different. The elections in April were the most exciting ones in decades with the incredible rise (from 5 to 19 percent) of the populist party True Finns as the main attraction. But the intricate political puzzle that followed the success of the True Finns was not the only source of excitement in the elections. Especially not if you happen to be an open data enthusiast.

Since the mid-90s so called voter advice applications have played an increasingly important role in the Finnish elections. Voter advice applications are questionnaires about political issues put together by media outlets and NGOs for candidates to answer. Voters can then see which candidate match their own views best.

The by-product of these applications is a very interesting set of data. Here you got all the opinions of (almost) all the candidates gathered in an easily accesible format. I don’t know about you, but this surely gets me going.

Up until now this data has been a completely vested resource. The news rooms have kept it to themselves and not managed to take the analysis past the level of “lets see what the candidates think about nuclear power”. But this year things changed. The leading newspaper Helsingin Sanomat decided to publish their data openly a week before the elections. And within a couple of days The Crowd (bloggers, programmers etc.) managed to do more with the data than journalist had done in fifteen years:

  • Kansan Muisti (“the memory of the people”), a site resembling It’s Your Parliament, used the data to investigate if the MPs had voted in accordance with the promises made in the voter advice applications before the elections.

A few weeks after the election the public broadcaster YLE followed the example of Helsingin Sanomat and published their data as well.

Journalism is changing. The immediate reaction of a traditional journalist is often resistance when someone asks a newsroom to publicly share data. “Someone might steal a story that we haven’t yet done!!!” argues the Journalist 1.0.

But it is time to realize that journalistic output does not only have to be 700 word stories, neatly structured with headings, preamble and text. Journalism can also be publishing a set of data that will be refined and possibly developed into new stories by readers. As the case of Finland shows, there is still a great amount of unused journalistic potential in The Crowd.

Advertisements

The opinions of our new parliament – in 30 seconds

Yesterdays elections turned out to be even more exciting than everyone had expected. True Finns shocked everyone with their third position. Trying to get a government together now is not easy.

But what opinions do the new Finnish parliament represent? I’ve put together an interactive tool that lets you explore the opinions of the new MP’s – in about 30 seconds. It is based on the answers given in Yle’s vaalikone. I had to put it on a different server because this WordPress blog doesn’t support own Javascript adventures. Open it here.

Application opens in new window. Requires an updated web browser.

This was my first real visualization in the Javascript based Protovis. I strongly recommend it so far. I hardly have any prior Javascript experience, but with assistance from the tutorials at Knight Digital Media Center I have been able to figure it out quite quickly. I especially appreciate the possibilities to create interactivity. Made me realized what a waste of time mouse-clicking is (compared to mouseover interaction).

  • Get the data. (Note that the answers might be slightly outdated, I noted in the reporting from Yle that a few new candidates seem to have participated since I did the scrape. However, the big picture should be accurate.)

Yle vaalikone data offline for now

I’ve looked a bit closer at the Finnish copyright laws and come to the conclusion that I might have been a bit overenthusiastic publishing the raw data from the vaalikone of Yle. I scraped and published this data thinking it would merely be a mashup of public data and therefore a legal thing to do. However, I did not consider the 49 § of the copyright law which states that the producer of a catalogue or database including a “great amount of information” owns the exclusive right to distribute the catalogue (or an “essential” part of it). I have not run this by any legal expert, but my own conclusion is that:

  • Gathering the data should not be a problem. In other words, anyone can download my Ruby script (or this slightly more professional version by Anon) and scrape the data to ones own computer.
  • Publishing this data without the permission of Yle might be a legal problem. Or could one argue that the vaalikone data is actually not a catalogue or database, but rather a large number of “quotes” from candidates? After all the answers given by the candidates are not visually published as databases.
  • Publishing a mashup (a visualization for example) should not be a problem as it does not mean that the user gets a hold of the raw data.

I have contacted Yle to ask for their permission to publish the data again. Their response was that they will consider it within a couple of weeks.

If anyone holds any expertise in these questions I would love to hear your input.


Does age matter in politics?

I did a small analysis on the Yle vaalikone data tonight. I wanted to find out if young and old politicians differ in their opinions. Can I, as a fairly young voter, expect to find my political soul mate among my peers?

Well, that seems to depend on what questions are important to you. By calculating the correlation (I just used the correl-function in Excel) between the age of the candidate and the answers given in the voters advice application you get an idea of how well political opinions and age are linked together. The bigger the correlation, the wider the gap between young and old:

Correlation between age political opinions

The strongest correlations here are between 0,2 and 0,3. What does this mean in practice? Lets look at one of the questions more closely. The biggest difference between old and young candidates appears on the question about whether registered homosexual couples should have the same rights heterosexual couples. I did a visualization in Many Eyes to illustrate the difference: (click to open interactive full size version in Many Eyes)

Should homosexual couples have same rights as hetero couples?

So, do you think gay rights are important? Start looking among the young candidates for your pick. Is Nato, the length of the work day or the right to strike the most burning questions? In that case age is really just a number.


Why vaalikone data wants to be free

Helsingin Sanomat confirmed today that they will publish the data from their voters advice application (or in Finnish “vaalikone”, as I will call it from here on) openly next week under a Creative Commons 3.0 license. For a while I thought they would hold the data until after the elections. That is why I chose to scrape one of their questions myself the other day (which actually resulted in a story on HS.fi today).

This is great news. Why? Because, as I will argue in this post, vaalikone data wants to be free.

1. Why it just wouldn’t hurt

Lets start by trying to turn this argument around. Why should this data not be distributed publicly? I think the main reason why this does not happen today is pure ignorance and old-fashioned thinking. Most media outlets just haven’t thought of it. However, if you do think about it I suppose one of the main concerns would be that you give something away to your competitors for free. The data can be used to write stories (“what do candidates think about Nato?”) and you don’t want to break your information monopoly by sharing the data. After all you have probably spent both time and money to gather the answers from all the candidates.

This is a very traditional way of thinking about journalism. Let me present you with a different perspective.

Suppose you do give the data away for free. Would your opponents use it to fill their papers? As a reporter with plenty of newsroom experience I would say probably not. No paper would like to build story after story on a material gathered by an opponent (I do think that most newsrooms would have the decency to acknowledge their sources). Anyone that has worked in a newspaper knows that you don’t mind spending an afternoon trying to get hold of the same politician that was interviewed in the competing paper just because you don’t want to quote your rival. It’s a matter of pride.

So who would use the data? Well, for example bloggers like me. My previous post was a mashup of data from a question in the vaalikone of Helsingin Sanomat about who Finland should be friends with on Facebook. Helsingin Sanomat used the post to write their own story, which probably did not take more than half an hour. Or at least much less than it would have taken them to do all the data work themself. One could say that they managed to crowdsource the refining of the data. Cost for them? Nada.

For the media outlet the real value of an application such as a “vaalikone” is in the application itself, which hopefully attracts thousands of voters looking for the right candidate. More visitors = more potential advertisement. Sharing the data doesn’t change this.

Ergo, it just wouldn’t hurt to give away the data.

2. Why there really is no option

Even if you don’t agree with my argument so far, the option of keeping the data to yourself might not even be an option in practice. If you want voters to be able to see what candidates think about various questions you have to publish the answers. And if you publish the answers there is always a risk that someone will go through all the questions and record the responses.

With more than 2000 candidates and 20-30 questions this would of course be a lot of work. However, with a simple screen scraping script the process of going through every answer to every question could be done in a matter of minutes. We are not talking about War Games style hacking here, just a small script that runs through all the (public!) pages. The same thing you could do manually yourself if you would have an extraordinary amount of spare time.

This is what I did when I recently scraped the vaalikone of Yle. Is this not stealing? Nope, not if you ask me. One could also say it is good old-fashioned investigative reporting. After all, is going through a large number of (public!) files and publishing the results not what we usually call investigative journalism? Is it somehow different if you let an automated script do all the work? I would say it’s more clever.

Ergo, even if you don’t want to publish your data, there really might not be an option. If you don’t share, someone else will.

3. Why it is the new (and right) way of doing things

Once upon a time journalism was a profession reserved for people working in more or less fancy offices. Reporters did not hesitate to take a certain pride in their position. Today this traditional role of the journalist is being challenged by bloggers and other online spectators – or citizen journalists as some might call them. It is not as easy as it used to be to define who is a journalist. In Sweden the web forum Flashback was nominated for a the journalist award of the year after a collective investigation of a severe case of school bullying. Were they journalists?

One can argue about wheter a thread on a discussion board is journalism or not, but any newsroom with serious ambitions of pursuing modern investigative reporting should consider engaging the public in one way or another. Workshops such as HS Open shows that the innovative potential is likely to be much bigger outside, than inside the newsroom. The more eyes that get to run through the data, the greater the chance of finding interesting and meaningful patterns. The more programmers that get to play around with the numbers, the cooler the mash-ups. What could they accomplish? I don’t know. And that is sort of the point with innovation and investigation.

Ergo, we need to start thinking in new ways about doing journalism and publishing open vaalikone data would be a good start. Information wants to be free, also the one behind a vaalikone.

If you live in Helsinki and you want to continue this discussion in real life, join the debate “Vaalikoneet auki!” on Wednesday 30th March.

 


Finland on Facebook – according to candidates

With whom should Finland be friends on Facebook? Helsingin Sanomat asked this question of all the candidates in the parliamentary elections. I screen scraped that the 1747 answers to see what they thought. If the candidates would get to choose, Finland’s Facebook profile would look something like this:

Russia
1057 friends in common
Sweden
885 friends in common
Estonia
595 friends in common
Norway
591 friends in common
Germany
373 friends in common
USA
219 friends in common
Denmark
145 friends in common
China
119 friends in common
Cuba
71 friends in common
India
61 friends in common

So Russia is apparently our best friend. Or at least that is what we want to make them believe. Cuba ends up surprisingly high, but that is much beacuse of the Communist Party that is still keeping it real.

You’ll find the data on Google Docs if you want to examine it yourself.


Yle vaalikone data updated

I did a new scrape of the Yle vaalikone data yesterday. About 300 new candidates have answered, making it 1801 people all together.

  • The answers of all candidates in Yle vaalikone (Google Docs) (offline for now)

This time I combined it with candidate data from ehdolla.org (thank you Google Fusion Tables). Note that questions 31-33 are specific for each district and not comparable.

//EDIT: I found some errors in the previous dataset (or rather Verkostoanatomia). I did the scrape again, but skipped the fusion part. I hope this is correct now.

If you want to see how the scrape was done I’ll also leave you with the Ruby file. Don’t hesitate to comment on my coding if you have any suggestions about improvements (I am sure there are many).

//UPDATE: The dataset has been taken offline. This is why. Hope to be able to put it back soon.