And, if you’re wondering if this means that I’m still on this Crowdsourcing kick, the answer is yes.
In this case, the crowd is contributing data – lots and lots of data, which is then freely available for any geek or number nerd to come along and wring sense out of. Segaran built his Walmart growth video using data from Freebase, an open, public database that launched a year or so ago. Freebase is an attempt to build a freely accessible database of the world’s knowledge. Here’s how they describe it:
Freebase, created by Metaweb Technologies, is an open database of the world’s information. It’s built by the community and for the community – free for anyone to query, contribute to, build applications on top of, or integrate into their websites. Already, Freebase covers millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC archives, it contains structured information on many popular topics, including movies, music, people and locations – all reconciled and freely available via an open API.
It’s a noble ambition, similar to the noble ambition that created it’s cousin, Wikipedia. The difference between the two is that in Wikipedia the community contributes information by posting words, narratives, articles. In Freebase, the community contributes data – and applications that make use of that data. If you work with public databases, this is the stuff that dreams are made of. Data, raw data, and lots of it, too.
If it’s so great, I can hear you think, why aren’t more people using it?
I think there’s at least two answers: the difficulty in accessing, using and displaying the data — ultimately this is a number nerd playground. And while I happen to think a number nerd is exactly what every newspaper needs, I’m not sure Freebase is going to be able to live up to it’s founders dreams for one simple reason: data reliability. Anyone can contribute data, anyone can edit it and that seems to pose serious problems, more serious, I think than it does for Wikipedia where errors and ommissions and bias can more easily be weeded out or at least made obvious by members. With data – and especially applications or mashups (like the Walmart map) the data source is obscured, you can’t make a rational judgement without digging past the thing (map, table, mashup etc) you’re looking at and rooting around in the source data. And very few people can or will do that – which weakens the critical self-correcting funcitons a crowdsource project like this needs. Here’s how Freebase answers the question: How do I know the data is true?
Actually, you don’t know for sure. Because Freebase lets anyone edit the data, there’s always a chance that somebody has—intentionally or unintentionally—introduced a mistake. By the same token, data in the system can be cleaned up by anyone, and people make incremental improvements all the time.
What do you think? Is there a future for a public, open database project like this? Bill