November 13, 2010 5:46 PM   Subscribe

Kaggle hosts competitions to glean information from massive data sets, a la the Netflix Prize. Competitors can enter free, while companies with vast stores of impenetrable data pay Kaggle to outsource their difficulties to the world population of freelance data-miners. Kaggle contestants have already developed dozens of chess rating systems which outperform the Elo rating currently in use, and identified genetic markers in HIV associated with a rise in viral load. Right now, you can compete to forecast tourism statistics or predict unknown edges in a social network. Teachers who want to pit their students against each other can host a Kaggle contest free of charge.
posted by escabeche (10 comments total) 21 users marked this as a favorite
There's a problem with this. The signal is not actually hidden and it's possible to reconstruct personal identifying information from datasets -- anonymized data is not actually anonymous, which is why Netflix canceled its prize.
posted by spiderskull at 6:10 PM on November 13, 2010 [1 favorite]

Let's see- 10 contests available so far..
...total prizes: $ 4417
...total weeks: 27

Even if you "win" every single one you could only earn 164 a week, that's about 7800 a year. And that is rather unlikely as each challenge methodology is quite different & if you were skilled enough to master all of them, well, you wouldn't be part of a crowd, you'd be an expert.

Kaggle isn't going to attract the "Worlds Best Data Scientists" at that price - only leverage the starving students in the first world against the the english speaking analytically literate of the rest of the planet. Of course both of those populations could use the $500. Free data is nice too.
posted by zenon at 8:22 PM on November 13, 2010

What I am saying - it ain't the cool million that netflix offered, but it is better than nothing.
posted by zenon at 8:24 PM on November 13, 2010

Am I the only person thinking of Ender's Game right now?
posted by AkzidenzGrotesk at 8:38 PM on November 13, 2010

Zenon, the Hearst Challenge fronts on a different URL but is still a Kaggle competition, and its prize is $25,000. (The challenge problem in this case is something do with optimizing the logistics of newspaper distribution: predicting how many of each newspaper should be sent to each newsstand.)

But I'm pretty sure it's not the prize money, but the chance to get lots of interesting problems with real-world sized data sets that's the draw. I'm not exactly one of the word's best data scientists, but I'm interested enough to subscribe to their newsletter, and I have done so.
posted by Michael Roberts at 9:05 PM on November 13, 2010

Kinda like an inverse Kickstarter. Or, more ominously, like a 99Designs for data analysis experts.
posted by breath at 10:43 PM on November 13, 2010 [1 favorite]

Funny, I was just looking for some datasets to analyze in order to try out RapidMiner. Thanks for the link!
posted by acheekymonkey at 2:04 AM on November 14, 2010

Is there actually such a thing as freelance data mining?

If yes, I'd be intrigued to know more... like where and how it happens.

Anyway, nice site with some fun challenges.
posted by philipy at 6:52 AM on November 14, 2010

Michael Roberts: Any decent data analyst always throws out the outliers!

Seriously - I sound a little too pessimistic there, but I think breath is on to something. It is going to change how folks approach data analysis. The worst case scenario in my mind is when the same starving grad students who would be attempting this are very group who are most likely to be out in the cold if small academic studies switch to this model of analysis. Thankfully that is unlikely in the near future, mainly because IRB boards are generally pretty conservative - for the reasons spiderskull lists.
posted by zenon at 7:33 AM on November 14, 2010

Finally, someone to come up with reliable metrics for the MeFi Fantasy League!
posted by klangklangston at 9:21 AM on November 14, 2010

« Older The Circular Jump is a White Hole   |   "Last year at the World Cup, there were broken... Newer »

This thread has been archived and is closed to new comments