The news knows it
September 7, 2011 9:17 AM   Subscribe

Culturomics, an emerging field of study that applies climate-modeling levels of supercomputing power towards predicting human behavior, using "computerized analysis of vast digital book archives, offering novel insights into the functioning of human society", has been tried with digital news archives.

They claim to have predicted, after the fact, the Arab Spring and the location of Osama bin Laden.

I found the breakdown of humanity into six distinct "news cultures" most interesting, graphically representing the sometimes insular nature of news coverage. [pdf]
posted by nomisxid (12 comments total) 9 users marked this as a favorite
 
Wasn't this the premise of the "Foundation" novels? How did that turn out, I never read 'em.
posted by keratacon at 9:18 AM on September 7, 2011 [1 favorite]


"... is found to have forecasted ..."

That's an amazing formulation. I wonder what they will find to have been forcasted next?
posted by goethean at 9:22 AM on September 7, 2011 [2 favorites]


Retrodiction is easy, especially about the past.
posted by DU at 9:27 AM on September 7, 2011


Wow, you can learn about people and the world from the news. Who'da thunk it?!
posted by chavenet at 9:28 AM on September 7, 2011


Hari said there would be skeptics, but I believe.
posted by hypersloth at 9:42 AM on September 7, 2011 [1 favorite]


If one of them is named Hari Seldon, I am so out of this planet.
posted by GuyZero at 9:55 AM on September 7, 2011


I had a professor who was working on something very much like this, except designed to predict changes in the bond market.

Would someone much smarter than me point out how reliably this system *predicts* occurrences, as opposed to humans retroactively finding fluctuations coinciding or correlating with events that have already happened?
posted by Sticherbeast at 10:11 AM on September 7, 2011


Hooray! Another ugly, newly coined word for data miners' poaching on the humanities and social sciences!

This is interesting work on the level of the actual analysis, but greatly vitiated by a dopey framework of argument. Especially painful is the paper's lack of a sensible vocabulary for, much less theory of, what it's trying to talk about — scare-quoting "civilization" and hand-waving about "culture" a few times is not a substitute for actual analysis of what these results mean.

And as always, the corpus hangs an enormous asterisk around everything that's being claimed here. Neither newspapers nor news summary services created for Western intelligence agencies (!), which latter are the source of most of the interesting stuff in the paper, are a viable proxy for "culture," no matter what you think that word actually means.
posted by RogerB at 10:13 AM on September 7, 2011 [1 favorite]


Until this can actually return a profit by leveraging its know-how in to the Intrade market, I remain dutifully skeptical.
posted by Theta States at 10:42 AM on September 7, 2011


Sticherbeast: "Would someone much smarter than me point out how reliably this system *predicts* occurrences, as opposed to humans retroactively finding fluctuations coinciding or correlating with events that have already happened?"

Taken from the first example:

"Y axis reports the number of standard deviations from the mean, with higher numbers indicating greater positivity and lower numbers indicating greater negativity. January 2011 reports only the tone for 1 January through 24 January, capturing the period immediately preceding the protests."

One thing to notice is that in all their examples, the purported predictions are only weeks away. Since they're presenting the data monthly, that's basically predicting what's happening now. They're claiming, in the above example, that Egypt rioted on the 25th, and that their january sample didn't include that range, but I'd want to see timestamps before I believe that. Because I'm sure the first reports of protest were off the charts in sentiment analysis.

The problem I see is their corpus wasn't split into training and testing, and we have no mention of the false positives or false negatives. And in fact, there's simply too small a corpus being tested here. They mention political event databases, and it would have been a good test of predictive power to split it apart, train on some set and see what kind of predictions / false positives you get on the other part. Instead Leetaru seems to have cherry picked a set of recent events that I assume the CIA somehow egged on. Hell, even the 3 standard deviations metric isn't reused.


tl;dr A nice set of hypothesis with good illustrated anecdotes, but anything quantitatively predictive is yet to be revealed here.
posted by pwnguin at 4:04 PM on September 7, 2011


"They claim to have predicted, after the fact, the Arab Spring and the location of Osama bin Laden."

Phhttt. Undergrads did that before the fact.
posted by Pinback at 4:12 PM on September 7, 2011


Wasn't this the premise of the "Foundation" novels?

Yup. Individual humans are unpredictable but a statistical approach can predict what a civilization will do.

How did that turn out...

Pretty well for a while. Hari Seldon predicts the collapse of a galactic empire. It can't be prevented, but he predicts that setting up a science-preserving colony in the middle of nowhere would preserve knowledge and shorten the coming dark age. Every few decades when the colony is in a predicted crisis, a hologram of the long-dead Seldon will offer the colonists advice.

It all falls apart because the model doesn't cope well with black swans.
posted by justsomebodythatyouusedtoknow at 9:21 PM on September 7, 2011


« Older I come late, and I mean to come humbly.   |   "My dead migrant has fingerprints, but nobody... Newer »


This thread has been archived and is closed to new comments