Join 3,425 readers in helping fund MetaFilter (Hide)


Psycho History, it's what they wanna give me
February 26, 2013 6:57 AM   Subscribe

Psychohistory, the imaginary scientific field created by Isaac Asimov in his "Foundation" series, may no longer be fiction:
Song Chaoming, for instance, is a researcher at Northeastern University in Boston. He is a physicist, but he moonlights as a social scientist. With that hat on he has devised an algorithm which can look at someone’s mobile-phone records and predict with an average of 93% accuracy where that person is at any moment of any day. Given most people’s regular habits (sleep, commute, work, commute, sleep), this might not seem too hard. What is impressive is that his accuracy was never lower than 80% for any of the 50,000 people he looked at.
Full article at The Economist.
posted by Strange Interlude (39 comments total) 21 users marked this as a favorite

 
My Android phone quite consistently predicts half an hour before I am going to leave for work/home (which is not always predictable -- I usually leave at the same time, but not come back) and tells me the semi-accurate traffic report.

That said, if we assume that people are at work for 9 hours a day, and predictably at home asleep for 7, we've pretty much hit 66% for free.
posted by jeather at 7:05 AM on February 26, 2013 [9 favorites]


If you are calculating the behavior of individuals, you don't have Psychohistory. Psychohistory doesn't work on individuals, only very large collections of individuals. In fact, this property of it - that it only ever is concerned with vast movements, never individual people - is a major plot point in the novels.

...I may be a nerd.
posted by Tomorrowful at 7:07 AM on February 26, 2013 [68 favorites]


Tomorrowful, you are the right kind of nerd.
posted by DigDoug at 7:12 AM on February 26, 2013 [9 favorites]


I can vouch for the fact that psychohistory was at least enough of a thing in 1992 that someone asked me for directions to that particular department on the campus of John Jay College.

I was there as part of a theater company renting the space out and couldn't help them, but still.
posted by EmpressCallipygos at 7:14 AM on February 26, 2013 [1 favorite]


This is incredibly cool but also very scary. Will this technology ever be used to provide probable cause? "You're normally at work right now, Mr. Doe. Can I search your car?"
posted by Strass at 7:14 AM on February 26, 2013 [6 favorites]


A+ post title. We would have also accepted "I wanna be predicted."
posted by griphus at 7:16 AM on February 26, 2013 [4 favorites]


Individuals : This Method :: Groups : Google

The game changer isn't a new algorithm, it's the amount of data that can be parsed.
posted by gwint at 7:21 AM on February 26, 2013 [1 favorite]


My Android phone ... tells me the semi-accurate traffic report.

There are real upsides to allowing tracking on your phone. I have a couple of routes to/from work, one of which is quicker if the roads are clear, but horrendous if there's a stoppage. This faster route also goes through the worst intersection for accidents in eastern Ontario. Now has saved me hours of time in the car in the last year alone.

There are real privacy concerns with regard to tracking, but, it's important to realize that it's not all negative, not all election riggers, ads and stalkers. The public health value is really amazing and could likely save lives. For example, there's a new coronavirus out there right now---tracking it is really important to limiting spread.
posted by bonehead at 7:25 AM on February 26, 2013 [1 favorite]


With that hat on he has devised an algorithm which can look at someone’s mobile-phone records and predict with an average of 93% accuracy where that person is at any moment of any day. Given most people’s regular habits (sleep, commute, work, commute, sleep), this might not seem too hard. What is impressive is that his accuracy was never lower than 80% for any of the 50,000 people he looked at.

I would like to hear a little bit more detail on this. Apparently it was announced at the AAAS meeting in Boston, but I can't find anything about it in a quick look at the website.For one thing, what mobile phone record were used? Location? Numbers called? Browser history? Text messages? And how did he get this information for 50,000 people. That is a really big study. Also, how did he determine the accuracy of his predictions. To do so he would have had to know with close to 100% accuracy where those 50,000 people were in order to verify that his predictions were >80% accurate for all of them. Even if he was able to do this I find it hard to imagine there weren't at least a few outliers. Emergency personnel or repairmen, for example. How could he predict that an EMT was going to get called to a wreck out on County Line Road in advance? How far in advance were his predictions and how precise? If you predict that most people will be within a 20 mile radius of their house on most days you will have a good success rate but it won't be very useful.
posted by TedW at 7:29 AM on February 26, 2013 [4 favorites]


gwint is a little bit correct & a little bit wrong. There are quite a few varieties of machine learning models out there, all for different kinds of situations. (You've got your topic models, you've got your support vector machines, you've got your decision trees...) & Naive Bayes has been working a treat (albeit with modifications & tweaks) in your spam filtering system for at least ten years now. These are all different algorithms, but they are all derived from an understanding of Bayesian statistics. & Baysian statistics work better as you get more data that fits the model. (See: Silver, Nate. For someone who gets all in a tizzy about the potential weaknesses of Bayesian stats, see: that guy who wrote the book Black Swan.)

Similarly, the Economist kind of gets Hari Seldon a little bit right & a little bit wrong. Psychohistory is supposed to be a model that, given a few facts about different societies, can model all of their future developments. This is a model that, given some very granular data about how people move, can predict certain things with high accuracy. I'm not trying to undervalue the research achievements, but think it's important to emphasize that the one thing the modelers do not have is a deep understanding of the collective actions of individuals, which is something on which psychohistory is predicated.
posted by Going To Maine at 7:35 AM on February 26, 2013 [8 favorites]


bonehead: There are real upsides to allowing tracking on your phone. I have a couple of routes to/from work...
I don't think there's any technical reason that would require your movements to be stored in some faceless datacenter, though. Your phone has GPS and a powerful processor; I can't think of any reason your route history couldn't be stored, and current route recommendations be calculated from the client side.
The public health value is really amazing and could likely save lives. For example, there's a new coronavirus out there right now---tracking it is really important to limiting spread.
Is there some reason this would require real-time, identifiable tracking? Or would some jitter-added, anonymous, after-the-fact history of a person's movements be enough?
posted by Western Infidels at 7:37 AM on February 26, 2013 [1 favorite]


I'd more say it indicates that the police are reaching the point of being able to predict the moments of people of interest given mere access to past phone logs.
posted by jaduncan at 7:38 AM on February 26, 2013 [1 favorite]


fwiw, psycho-history (non Asimov) is a real thing. "the study of historical motivations". It's not big, is spearheaded by Lloyd deMause. I subscribed to The Journal of Psychohistory. for a few years a few decades ago.
wrong kind of nerd I am sure
posted by edgeways at 8:08 AM on February 26, 2013 [3 favorites]


@Western Infidels - but your solitary phone needs to know about all the other phones that are stuck in traffic (which is how the traffic info is collected) which requires you contribute your info for the greater good.
posted by zeoslap at 8:08 AM on February 26, 2013


This is making Asimov's psychohistory no longer fictional in the same way that Organic Chemistry 101 is making Professor Snape's Potions class come true.
posted by straight at 8:10 AM on February 26, 2013 [4 favorites]


Western Infidels: a lot of these statistical techniques already add noise to things to make them less prone to statistical overfitting (looking at the data myopically, failing to make a real predictive model). See this paper, I think you can access it.

I attended an interesting presentation by an AI guy recently who gave us an actually pretty satisfactory definition of big data. He says that previous big gains in the spiffyness of AI systems like optical character recognition and recommendation systems came from better representations of the data. Now, the big gains in the spiffyness of AI comes from getting a shitton more data. "When did our models become nicer?" I asked, and he said that models don't really matter.

Why did he say that? There are some things that some models can do that other models can't (try using a non-kernel method on infinite-dimensionality data), so I don't think his characterization was entirely accurate. However, Vapnik and co. did SVM's more than ten years ago and it's been the "try this first" for academics for a long time (in CS) now, so if models were responsible for the increasing niceness of predictions, then we would have had that increasing niceness ten years ago. So I think that the AI guy's characterization was mostly right: we just don't give credence to how problem representations are so damn important in machine learning tasks.

I got to meet Marvin Minsky in a dinner thing once, and he spoke on and on about how modern statistical AI sucked because it didn't have enough logic in it and it didn't think about problem representations enough. Sort of a little hashed version of the Chomsky vs. Norvig spat. I mean, it's pretty obvious in the modern day how logic AI kind of sucks for doing real problems and statistical AI is much better, but his second point seemed pretty accurate to me.
posted by curuinor at 8:14 AM on February 26, 2013 [1 favorite]


There are real privacy concerns with regard to tracking...

Not really. There are real privacy concerns with letting the tracking information leave your phone. As someone just pointed out to me at the office today, "privacy" doesn't mean "automatically upload my photos to Google, but mark them 'private'". It means "don't do anything with my photos that I didn't specifically tell you to do".
posted by DU at 8:18 AM on February 26, 2013 [2 favorites]


Nthing the "what kind of records were collected?" question. If you're looking at someone's movement data, predicting the next day's movements is pretty straightforward. What kind of demographics were they looking at? Were there outliers for people who deviated from their normal routines, and was it able to predict the deviation? That's what's hard.
posted by verb at 8:33 AM on February 26, 2013 [1 favorite]


Recording your own routes and data isn't what being discussed though, it's the sharing of that data to produce larger aggregates, which does trigger the privacy concerns for most people. These analyses can only happen in the ensembles, and really can't be done with only a personal data set.

The kerfuffle about Apple tracking cell phones a couple of years ago to improve data and call quality is an example.
posted by bonehead at 8:35 AM on February 26, 2013


Note that these predictive methods work badly if you're a court jester.
posted by skyscraper at 8:35 AM on February 26, 2013


With that hat on he has devised an algorithm which can look at someone’s mobile-phone records and predict with an average of 93% accuracy where that person is at any moment of any day.

Before we get too het up about this, it's important to keep in mind that, no, Song did not do this. As best I can tell, this refers to a paper of Song et al in Science, where that 93% figure appears as a measure of entropy of some data they gathered; one way to interpret this, as they mention, is as a theoretical upper bound for the predictability of motion.

But there is no prediction algorithm, nor do they claim there's an algorithm, nor do they offer any ideas for such an algorithm, nor do they suggest that such an algorithm even ought to exist.

So unless I'm missing something, the Economist article is just saying something that's not true.
posted by escabeche at 8:36 AM on February 26, 2013 [6 favorites]


nor do they suggest that such an algorithm even ought to exist.

Well, that's not the best characterization. Their attitude goes like this:

Although making explicit predictions on user whereabouts is beyond our goals here, appropriate data-mining algorithms (19, 20, 27) could turn the predictability identified in our study into actual mobility predictions. Most important, our results indicate that when it comes to processes driven by human mobility, from epidemic modeling to urban planning and traffic engineering, the development of accurate predictive models is a scientifically grounded possibility, with potential impact on our well-being and public health. At a more fundamental level, they also indicate that, despite our deep-rooted desire for change and spontaneity, our daily mobility is, in fact, characterized by a deep-rooted regularity.
posted by curuinor at 8:45 AM on February 26, 2013


skyscraper: "Note that these predictive methods work badly if you're a court jester."

Only if you're also mentalic.
posted by Chrysostom at 8:53 AM on February 26, 2013


There are real upsides to allowing tracking on your phone.

Yeah, I understand why it's useful -- and truthfully, even in a few months I've noticed an improvement -- but it is inaccurate enough, at this point, to not be totally helpful. It almost always catches bad slowdowns, but sometimes it makes up fake slowdowns, and sometimes it misses less bad slowdowns. I'm assuming that this will continue to improve, but it's not quite good enough to depend on for things where timing is very important.
posted by jeather at 8:58 AM on February 26, 2013


Also, how did he determine the accuracy of his predictions. To do so he would have had to know with close to 100% accuracy where those 50,000 people were in order to verify that his predictions were > 80% accurate for all of them.

My guess is that they used the beginning chunk of time/location data to train a model, which then predicted the ending chunk of the same with 93% accuracy.

And how did he get this information for 50,000 people.

Clearly, with the cooperation of one or more wireless carriers, but I'd bet most of the test subjects would be surprised to learn they participated. "Consent" was probably buried in a 20-page EULA allowing the wireless carrier to do whatever it wants with personal data. I can't imagine 50,000 people would knowingly volunteer for something like this.
posted by qxntpqbbbqxl at 9:03 AM on February 26, 2013 [2 favorites]


Isn't this just a really overblown way of saying that humans are creatures of habit?
posted by Zalzidrax at 9:23 AM on February 26, 2013 [1 favorite]


However, we must also remember that having lots and lots of data isn't a panacea.

And on another note, what happens when Deep Thought discovers algorithms underlying the behavior of masses that we humans cannot fathom?
posted by Apocryphon at 9:25 AM on February 26, 2013 [1 favorite]


but [tracking] is inaccurate enough, at this point, to not be totally helpful.

That's the trade-off that I don't think people totally get right now, the utility vs. the loss of privacy. I'd argue that the perceived loss is significantly greater than the actual (to date) and that the utility is under-realized, both in fact and potential.

That said, I don't think the legal and business IT systems take these issues seriously enough yet either. It's encouraging to note that it seems possible. Governments and businesses have proven themselves capable of handling identity information when it comes to real money: taxes and banking info, for example. If aggregate tracking data is really as valuable as it seems, I'd rather the level of information security be that of a bank rather than that of, say Doubleclick.

I don't know what the legal protections are in the US, but we need better and clearer privacy protections. It's possible, I think, to get the benefits without too many of the down sides, but we don't have the understanding in the general public yet about what data should be used for, and what its appropriate uses are, let alone a legislative framework that works.
posted by bonehead at 9:36 AM on February 26, 2013


> accuracy was never lower than 80%

So that's better than the results with assasination drones now, isn't it?
posted by hank at 9:37 AM on February 26, 2013


I have to admit that whenever I hear about a physicist with a hobby interest in applying quantitative methods to a "softer" field, my first thought is this smbc comic.
posted by advil at 9:39 AM on February 26, 2013 [3 favorites]


Where does Google get its live traffic data from? Short version: half traffic sensors embedded in roadways, half crowdsourced data from phone GPS's.
posted by Greg_Ace at 9:40 AM on February 26, 2013


If you are calculating the behavior of individuals, you don't have Psychohistory. Psychohistory doesn't work on individuals, only very large collections of individuals. In fact, this property of it - that it only ever is concerned with vast movements, never individual people - is a major plot point in the novels.
Yes, this isn't "psychohistory", it's just a dumbass hook for an article telling us what everyone already knows - you can learn a lot by using statistical methods on large amounts of data. In fact, you don't even need that much data - of course if you look at an up to-date record of where someone is all the time, you are going to be able to guess where they are at the current point in time.

But it's completely absurd to claim you are predicting the future, you're predicting the present. If you got cell phone records for a college student, you would have no way of predicting their location at 3:15 PM on june 14th five years in the future.
There are real upsides to allowing tracking on your phone. I have a couple of routes to/from work, one of which is quicker if the roads are clear, but horrendous if there's a stoppage. This faster route also goes through the worst intersection for accidents in eastern Ontario. Now has saved me hours of time in the car in the last year alone.
There's no technological reason why you would need to have your location tracked to do that. All you would need to have your phone do is download data on current traffic conditions. Why would you need to be tracked?

Originally "AGPS" was just done to save space on the phone, rather then storing maps like an older-style GPS nav device, you would download them from the cellphone network, which obviously requires you tell the cellphone provider where you are to get the right maps.

But now that phones can store gigs of data, it's totally unnecessary. If you download traffic data for a whole city (probably not very large) there would be no way to tell where you were in that city.
@Western Infidels - but your solitary phone needs to know about all the other phones that are stuck in traffic (which is how the traffic info is collected) which requires you contribute your info for the greater good.
Yes, it would be helpful for everyone else to turn on tracking. But you, yourself do not need to do it. Also, while cell phones might be convenient, there are probably other ways to measure traffic density. Radio and TV stations have been doing it via helicopter for years. A company like google could easily put up a UAV and with machine vision algorithms calculate traffic density in a fully automated way without spending much money at all. Also, cities could just put sensors on the roads as well.
posted by delmoi at 10:33 AM on February 26, 2013


I alternate between driving to work and taking the train to work at the whims of a chronic back condition. This, in turn, alters my hours in the office by as much as +- 3 hours a day, in a pattern that is probably indistinguishable from random noise. On weekends, my wife and I pick between 3 grocery stores on the basis of whether we want fish or frozen Trader Joe's stuff that week. Our friends live lives filled with chaos, resulting in ad-hoc weekend activities which do not fit the slope of any graph.

You may call me... The Mule.
posted by Mayor West at 10:36 AM on February 26, 2013 [6 favorites]


delmoi: "But it's completely absurd to claim you are predicting the future, you're predicting the present. "

It's tough to make predictions, especially about the future.
posted by Chrysostom at 10:46 AM on February 26, 2013 [1 favorite]


There's no technological reason why you would need to have your location tracked to do that. All you would need to have your phone do is download data on current traffic conditions. Why would you need to be tracked?

Google Now needs your location to do its predictive magic. Sure, you could and can go into a mapping application and look up the traffic data, but that's a bunch of more steps. With Now, it's a matter of seconds to look at your notifications. Better automation and presentation of data needs more context for this purpose.
posted by bonehead at 10:52 AM on February 26, 2013


I alternate between driving to work and taking the train to work at the whims of a chronic back condition. This, in turn, alters my hours in the office by as much as +- 3 hours a day, in a pattern that is probably indistinguishable from random noise. On weekends, my wife and I pick between 3 grocery stores on the basis of whether we want fish or frozen Trader Joe's stuff that week. Our friends live lives filled with chaos, resulting in ad-hoc weekend activities which do not fit the slope of any graph.

It strikes me that this could be taught as a Surveillance Age martial/survival art, like the erratic movements of a Zui Quan master. You should quit your current job and teach others your technique.
posted by Strange Interlude at 11:08 AM on February 26, 2013 [2 favorites]


There remains no absolute reason why a phone needs to report its location to anyone but the owner, unless she chooses otherwise.

The choice which has been made for mobile-users is still open to negotiation. The software may be locked, but the circuit trace between the GPS receiver and the CPU may not be. Analog triangulation is very difficult in some environments. There's no law against a power-off switch or against a metal container.

Phone owners' ignorance of the technology they've chosen to own, while they complain about it, is pretty sad. No law requires them to play along in most respects or to remain ignorant of the tech they uncomprehendingly own.

Any sufficiently advanced technology may be your owner.
posted by Twang at 1:32 PM on February 26, 2013


I like that everyone know where I am via routine and constant check-ins in various social media. It means if I am in danger from myself or others its easier for me to be found.

Most people do things in predictable patterns; any cynical 16 year old can tell you that.
posted by Charlemagne In Sweatpants at 2:09 PM on February 26, 2013 [1 favorite]


If you got cell phone records for a college student, you would have no way of predicting their location at 3:15 PM on june 14th five years in the future.

But if you had the same records for a 35-year old man, you might be able to do so. People get a bit more stable in their locations after college.
posted by Going To Maine at 9:07 PM on February 26, 2013 [1 favorite]


« Older The second, rocking single from David Bowie's fort...  |  "In this series of portraits I... Newer »


This thread has been archived and is closed to new comments