How open data improved election coverage in Finland

This is a guest post I’ve written at the Open Knowledge Foundation Blog.

Parliamentary elections in Finland are usually rather dull. Rarely does the rest of the world bother to pay any attention. But this year was different. The elections in April were the most exciting ones in decades with the incredible rise (from 5 to 19 percent) of the populist party True Finns as the main attraction. But the intricate political puzzle that followed the success of the True Finns was not the only source of excitement in the elections. Especially not if you happen to be an open data enthusiast.

Since the mid-90s so called voter advice applications have played an increasingly important role in the Finnish elections. Voter advice applications are questionnaires about political issues put together by media outlets and NGOs for candidates to answer. Voters can then see which candidate match their own views best.

The by-product of these applications is a very interesting set of data. Here you got all the opinions of (almost) all the candidates gathered in an easily accesible format. I don’t know about you, but this surely gets me going.

Up until now this data has been a completely vested resource. The news rooms have kept it to themselves and not managed to take the analysis past the level of “lets see what the candidates think about nuclear power”. But this year things changed. The leading newspaper Helsingin Sanomat decided to publish their data openly a week before the elections. And within a couple of days The Crowd (bloggers, programmers etc.) managed to do more with the data than journalist had done in fifteen years:

  • Kansan Muisti (“the memory of the people”), a site resembling It’s Your Parliament, used the data to investigate if the MPs had voted in accordance with the promises made in the voter advice applications before the elections.

A few weeks after the election the public broadcaster YLE followed the example of Helsingin Sanomat and published their data as well.

Journalism is changing. The immediate reaction of a traditional journalist is often resistance when someone asks a newsroom to publicly share data. “Someone might steal a story that we haven’t yet done!!!” argues the Journalist 1.0.

But it is time to realize that journalistic output does not only have to be 700 word stories, neatly structured with headings, preamble and text. Journalism can also be publishing a set of data that will be refined and possibly developed into new stories by readers. As the case of Finland shows, there is still a great amount of unused journalistic potential in The Crowd.

Yle vaalikone data offline for now

I’ve looked a bit closer at the Finnish copyright laws and come to the conclusion that I might have been a bit overenthusiastic publishing the raw data from the vaalikone of Yle. I scraped and published this data thinking it would merely be a mashup of public data and therefore a legal thing to do. However, I did not consider the 49 § of the copyright law which states that the producer of a catalogue or database including a “great amount of information” owns the exclusive right to distribute the catalogue (or an “essential” part of it). I have not run this by any legal expert, but my own conclusion is that:

  • Gathering the data should not be a problem. In other words, anyone can download my Ruby script (or this slightly more professional version by Anon) and scrape the data to ones own computer.
  • Publishing this data without the permission of Yle might be a legal problem. Or could one argue that the vaalikone data is actually not a catalogue or database, but rather a large number of “quotes” from candidates? After all the answers given by the candidates are not visually published as databases.
  • Publishing a mashup (a visualization for example) should not be a problem as it does not mean that the user gets a hold of the raw data.

I have contacted Yle to ask for their permission to publish the data again. Their response was that they will consider it within a couple of weeks.

If anyone holds any expertise in these questions I would love to hear your input.

Why vaalikone data wants to be free

Helsingin Sanomat confirmed today that they will publish the data from their voters advice application (or in Finnish “vaalikone”, as I will call it from here on) openly next week under a Creative Commons 3.0 license. For a while I thought they would hold the data until after the elections. That is why I chose to scrape one of their questions myself the other day (which actually resulted in a story on today).

This is great news. Why? Because, as I will argue in this post, vaalikone data wants to be free.

1. Why it just wouldn’t hurt

Lets start by trying to turn this argument around. Why should this data not be distributed publicly? I think the main reason why this does not happen today is pure ignorance and old-fashioned thinking. Most media outlets just haven’t thought of it. However, if you do think about it I suppose one of the main concerns would be that you give something away to your competitors for free. The data can be used to write stories (“what do candidates think about Nato?”) and you don’t want to break your information monopoly by sharing the data. After all you have probably spent both time and money to gather the answers from all the candidates.

This is a very traditional way of thinking about journalism. Let me present you with a different perspective.

Suppose you do give the data away for free. Would your opponents use it to fill their papers? As a reporter with plenty of newsroom experience I would say probably not. No paper would like to build story after story on a material gathered by an opponent (I do think that most newsrooms would have the decency to acknowledge their sources). Anyone that has worked in a newspaper knows that you don’t mind spending an afternoon trying to get hold of the same politician that was interviewed in the competing paper just because you don’t want to quote your rival. It’s a matter of pride.

So who would use the data? Well, for example bloggers like me. My previous post was a mashup of data from a question in the vaalikone of Helsingin Sanomat about who Finland should be friends with on Facebook. Helsingin Sanomat used the post to write their own story, which probably did not take more than half an hour. Or at least much less than it would have taken them to do all the data work themself. One could say that they managed to crowdsource the refining of the data. Cost for them? Nada.

For the media outlet the real value of an application such as a “vaalikone” is in the application itself, which hopefully attracts thousands of voters looking for the right candidate. More visitors = more potential advertisement. Sharing the data doesn’t change this.

Ergo, it just wouldn’t hurt to give away the data.

2. Why there really is no option

Even if you don’t agree with my argument so far, the option of keeping the data to yourself might not even be an option in practice. If you want voters to be able to see what candidates think about various questions you have to publish the answers. And if you publish the answers there is always a risk that someone will go through all the questions and record the responses.

With more than 2000 candidates and 20-30 questions this would of course be a lot of work. However, with a simple screen scraping script the process of going through every answer to every question could be done in a matter of minutes. We are not talking about War Games style hacking here, just a small script that runs through all the (public!) pages. The same thing you could do manually yourself if you would have an extraordinary amount of spare time.

This is what I did when I recently scraped the vaalikone of Yle. Is this not stealing? Nope, not if you ask me. One could also say it is good old-fashioned investigative reporting. After all, is going through a large number of (public!) files and publishing the results not what we usually call investigative journalism? Is it somehow different if you let an automated script do all the work? I would say it’s more clever.

Ergo, even if you don’t want to publish your data, there really might not be an option. If you don’t share, someone else will.

3. Why it is the new (and right) way of doing things

Once upon a time journalism was a profession reserved for people working in more or less fancy offices. Reporters did not hesitate to take a certain pride in their position. Today this traditional role of the journalist is being challenged by bloggers and other online spectators – or citizen journalists as some might call them. It is not as easy as it used to be to define who is a journalist. In Sweden the web forum Flashback was nominated for a the journalist award of the year after a collective investigation of a severe case of school bullying. Were they journalists?

One can argue about wheter a thread on a discussion board is journalism or not, but any newsroom with serious ambitions of pursuing modern investigative reporting should consider engaging the public in one way or another. Workshops such as HS Open shows that the innovative potential is likely to be much bigger outside, than inside the newsroom. The more eyes that get to run through the data, the greater the chance of finding interesting and meaningful patterns. The more programmers that get to play around with the numbers, the cooler the mash-ups. What could they accomplish? I don’t know. And that is sort of the point with innovation and investigation.

Ergo, we need to start thinking in new ways about doing journalism and publishing open vaalikone data would be a good start. Information wants to be free, also the one behind a vaalikone.

If you live in Helsinki and you want to continue this discussion in real life, join the debate “Vaalikoneet auki!” on Wednesday 30th March.


Helsingin Sanomat promises open “vaalikone”

Great news from Helsingin Sanomat, Finland’s leading news paper. In my previous post I scraped the voting advice application (vaalikone) of Kepa and complained that the data from these applications almost never is distributed publicly. This has now changed as HS promise to publish the full dataset behind their application this year in the name of open data and data journalism. That hopefully means all the answers given by the candidates will downloadable in an accessible format.

Their voting advice application will open later this month and in mid-March HS will arrange a hacks and hackers style workshop will discuss how the data from the application could be used. I hope I’ll be able to attend that event.