Why vaalikone data wants to be free

Helsingin Sanomat confirmed today that they will publish the data from their voters advice application (or in Finnish “vaalikone”, as I will call it from here on) openly next week under a Creative Commons 3.0 license. For a while I thought they would hold the data until after the elections. That is why I chose to scrape one of their questions myself the other day (which actually resulted in a story on HS.fi today).

This is great news. Why? Because, as I will argue in this post, vaalikone data wants to be free.

1. Why it just wouldn’t hurt

Lets start by trying to turn this argument around. Why should this data not be distributed publicly? I think the main reason why this does not happen today is pure ignorance and old-fashioned thinking. Most media outlets just haven’t thought of it. However, if you do think about it I suppose one of the main concerns would be that you give something away to your competitors for free. The data can be used to write stories (“what do candidates think about Nato?”) and you don’t want to break your information monopoly by sharing the data. After all you have probably spent both time and money to gather the answers from all the candidates.

This is a very traditional way of thinking about journalism. Let me present you with a different perspective.

Suppose you do give the data away for free. Would your opponents use it to fill their papers? As a reporter with plenty of newsroom experience I would say probably not. No paper would like to build story after story on a material gathered by an opponent (I do think that most newsrooms would have the decency to acknowledge their sources). Anyone that has worked in a newspaper knows that you don’t mind spending an afternoon trying to get hold of the same politician that was interviewed in the competing paper just because you don’t want to quote your rival. It’s a matter of pride.

So who would use the data? Well, for example bloggers like me. My previous post was a mashup of data from a question in the vaalikone of Helsingin Sanomat about who Finland should be friends with on Facebook. Helsingin Sanomat used the post to write their own story, which probably did not take more than half an hour. Or at least much less than it would have taken them to do all the data work themself. One could say that they managed to crowdsource the refining of the data. Cost for them? Nada.

For the media outlet the real value of an application such as a “vaalikone” is in the application itself, which hopefully attracts thousands of voters looking for the right candidate. More visitors = more potential advertisement. Sharing the data doesn’t change this.

Ergo, it just wouldn’t hurt to give away the data.

2. Why there really is no option

Even if you don’t agree with my argument so far, the option of keeping the data to yourself might not even be an option in practice. If you want voters to be able to see what candidates think about various questions you have to publish the answers. And if you publish the answers there is always a risk that someone will go through all the questions and record the responses.

With more than 2000 candidates and 20-30 questions this would of course be a lot of work. However, with a simple screen scraping script the process of going through every answer to every question could be done in a matter of minutes. We are not talking about War Games style hacking here, just a small script that runs through all the (public!) pages. The same thing you could do manually yourself if you would have an extraordinary amount of spare time.

This is what I did when I recently scraped the vaalikone of Yle. Is this not stealing? Nope, not if you ask me. One could also say it is good old-fashioned investigative reporting. After all, is going through a large number of (public!) files and publishing the results not what we usually call investigative journalism? Is it somehow different if you let an automated script do all the work? I would say it’s more clever.

Ergo, even if you don’t want to publish your data, there really might not be an option. If you don’t share, someone else will.

3. Why it is the new (and right) way of doing things

Once upon a time journalism was a profession reserved for people working in more or less fancy offices. Reporters did not hesitate to take a certain pride in their position. Today this traditional role of the journalist is being challenged by bloggers and other online spectators – or citizen journalists as some might call them. It is not as easy as it used to be to define who is a journalist. In Sweden the web forum Flashback was nominated for a the journalist award of the year after a collective investigation of a severe case of school bullying. Were they journalists?

One can argue about wheter a thread on a discussion board is journalism or not, but any newsroom with serious ambitions of pursuing modern investigative reporting should consider engaging the public in one way or another. Workshops such as HS Open shows that the innovative potential is likely to be much bigger outside, than inside the newsroom. The more eyes that get to run through the data, the greater the chance of finding interesting and meaningful patterns. The more programmers that get to play around with the numbers, the cooler the mash-ups. What could they accomplish? I don’t know. And that is sort of the point with innovation and investigation.

Ergo, we need to start thinking in new ways about doing journalism and publishing open vaalikone data would be a good start. Information wants to be free, also the one behind a vaalikone.

If you live in Helsinki and you want to continue this discussion in real life, join the debate “Vaalikoneet auki!” on Wednesday 30th March.


Growing open data movement in Finland

I’ve been psyched to learn about a number of very interesting Finnish data projects last couple of days. Here are a few initiatives that I really want to push (most of them only in Finnish unfortunately):

  • Ehdolla.org. This site opened yesterday after Big Clean Screen Scraping event in Jyväskylä this weekend. The idea is to gather open data about candidates in the parliamentary elections – political promises, campaign financing data, statements and so on.
  • HS Open. The leading Finnish news paper Helsingin Sanomat has taken a progressive role in data journalism by opening up the data from its voting advice application. Last week it arranged a workshop about how this data could be put to use. I couldn’t participate myself, but some very inspiring ideas seem to have emerged from the seminar. One idea is a game where the user has to stabilize the state budget by making her own cuts – a test few politicians seem to be willing to take now before the elections.
  • Helsinki region infoshare. In an attempt to make public data more accessible Helsinki and the neighbouring cities in the metropolitan area has just opened up a portal for public data. I’ve written about this project myself in an article in Finlands kommuntidning (in Swedish). I haven’t had time to examine the site myself yet, but the idea is great.