Who is Mister P?
June 11, 2017 2:00 PM   Subscribe

The catchily-named Multilevel Regression and Poststratification (MRP or Mr. P) is a newish technique for estimating opinion in states, cities, or legislative districts too numerous for there to be really solid sample sizes in every one of them. Yougov.uk recently used an MRP model to be among the first successfully predicting a hung parliament, getting about 93\% of constituencies right. Why do MRP? How does it work?

Among the bits of math governing how sampling and polling work is the Central Limit Theorem, which tells us that sample means (like the percentage of a sample that like X) are very well-behaved and are very good predictors of population means. Mostly the Central Limit Theorem is proof that God loves us and wants us to be happy, but there is an occasional dark lining to that silver cloud. The way the math falls out, it doesn't matter very much how big the population you're interested in is, you still need to sample about a thousand people to get really reliable estimates. You need a thousand Americans to make a good estimate of Americans, you need a thousand Vermonsters to make a good estimate of Vermonsters, and you need a thousand of the 60-75000 voters in UK constituencies like Bootle, Tooting, or Great Grimsby to make good estimates of their opinions.

What this means is that if you want really good estimates of American vote intention, you need about 50x100=50000 respondents. While a few big surveys like the NAES and CCES actually do this, their results aren't really available until after the election. If you want really good estimates for US House or UK Commons elections, you're basically fucked -- you'd need 435000 or 650000 respondents, and that would be really expensive.

Developed by David Park, Andrew Gelman, and Joseph Bafumi, and "popularized" by Jeff Lax and Justin Phillips, MRP is a way of leveraging the limited survey data you have, in combination with other information about the subdivisions you're looking at, to intelligently estimate opinion in those subdivisions with only a relative few -- say 50-100 -- respondents in each subdivision.

Here are a few papers that describe MRP or show its use -- none paywalled AFAICT but it can be hard to tell from my machines.

Gelman et al MRP an unrepresentative poll conducted on XBox into solid estimates of support for Obama in 2012. (pdf)

Lax and Phillips use MRP to estimate state opinion from normal-sized national polls (pdf1, pdf2, they are different papers)

Tausanovitch and Warshaw MRP-up estimates of ideology for cities and US legislative districts. (pdf)
posted by ROU_Xenophobe (12 comments total) 50 users marked this as a favorite
Why MRP? Because it's fun and awesome!

There are several alternatives if you want to estimate average or median opinions across subdivisions that don't have individually reasonable sample sizes.

Option 1: FUCK IT LET'S GO!, aka simple disaggregation. Just go ahead and use the small samples you do have. Best used after inhaling piles of cocaine. Problem 1: small sample sizes mean your confidence intervals -- the uncertainty you have to put around your estimates -- are really wide, so you can only say something like ``Between 30 and 70 percent of people in this district like Policy X.'' Problem 2: maybe you can't even say that, because some survey methodologies lead to perfectly good national samples but statewide samples that are -- to get technical -- for shit. Famously, the ANES sample methods have historically meant that many or most of the respondents from Indiana might be from the same part of Muncie, etc.

Option 2: Simulation. Whatcha do here is first run a regression model using different demographic factors to predict whatever opinion you're interested in. So you learn that 30 year old white men on average feel like this, and 24 year old black women on average feel like that, and 60 year old latino men on average feel some other way. Then, for every subdivision, you just look to see how many 30 year old white men and 24 year old black women and 60 year old latino men, etc etc, you have, and create an estimated opinion that way. You might not have had any 24 year old black women in your sample from NY District 26, but you have 24 year old black women from other districts, so you substitute their opinions in. Problem: this method has no awareness of geographic variation in opinion and assumes that 30 year old white men in San Francisco are the same as 30 year old white men in Salt Lake City. They aren't.

Option 3, MRP, is basically a smarter way to do simulation. What you do is first run a *multilevel* regression predicting the opinion of interest. Multilevel regressions allow different subdivisions to have different ``intercepts,'' which is to say that they notice that otherwise similar people in San Francisco tend to be more liberal than the same people in SLC. Or they can do more complex things that are even more awesome and fun! So instead of saying ``On average 30 year old white men have this opinion,'' you get to say ``On average, thirty year old white men have this opinion, but on average people in San Francisco are this much more liberal than other places, so thirty year old white men in San Francisco have, on average, this vaguely liberal opinion.'' This, unsurprisingly, is the multilevel regression part of MRP. The P part, poststratification, is when you look at the demographics of each subdivision to simulate the opinion of the district. But now you're modifying the expected opinion of each demographic group using your information about geographic variation in opinion, so your estimates are way smarter than simple simulation.
posted by ROU_Xenophobe at 2:00 PM on June 11 [19 favorites]

Yeah, yeah, I read about how amazingly wonderful the Culture's simulation technology was in Hydrogen Sonata. Can you see your way clear to having your buddies send down a few agents to start steering us towards something a little less barbaric? I'm tired of being in the control group.
posted by notoriety public at 2:19 PM on June 11 [3 favorites]

(cough) ROU_Xenophobe, your ``TeX'' is showing.
posted by belarius at 2:21 PM on June 11 [12 favorites]

Related XKCD, kind of :-)

Here's YouGov's own "day after" analysis (a well-deserved humblebrag).

(I liked the slight panic they had when it went from "ok, new model, interesting results, but it's a nowcast, and pay attention to confidence intervals" to SHOCK POLL!!! on the front pages, which resulted in a couple of "wait, this is a model, we don't really know how well this works" posts, trying to get the external expectations down a bit...)

And given the latest I hear from UK, we'll probably see the results of a second full scale test in a few months from now :-)
posted by effbot at 3:14 PM on June 11 [2 favorites]

Ah, hell, I forgot to add:

Mister P... Mosh is a song by Plastilina Mosh, who are sort of Monterrey's version of TMBG. Sort of. Not really. But that's as close as this gabacho is likely to get.
posted by ROU_Xenophobe at 5:18 PM on June 11

Andrew Gelman's blog (which I must admit I barely understand 20% of), has had a few references to Mr. P and YouGov that I found interesting. His curiosity and the range of topics make that blog one of my favorites to skim through over coffee.
posted by conradjones at 9:12 PM on June 11

A couple of things that really struck me in the YouGov day after report (linked by effbot above):

They predicted a Labour win in Canterbury (Tory since its creation in 1918) - "This prediction came from a combination of Canterbury being a relatively urban and Remain-leaning constituency within its region, and the presence of a large number of students, both of which were associated with Labour gains. "

But more peculiarly, they correctly predicted the huge swing in the Scottish constituency Ochil and south Perthshire, from SNP to Conservative, but they don't know why the model gave this: We have not looked into this forecast in sufficient detail to be sure exactly what patterns the model found that yielded this result.
posted by Azara at 5:11 AM on June 12 [2 favorites]

Mr P really made me say UHHHHH.
posted by DirtyOldTown at 12:12 PM on June 12

All models are bad; some models are useful.

This is all so dependent on your choosing the "right" stratification variables. For example, Tausanovitch uses a mix of variables that differ by level to predict policy preferences (district: average income, military voters, % same sex couples; state: % union, % evangelical or Mormon). All of these seem right on face; you or I would nod when someone said that those things impacted how conservative an area is. But we only think that because the model confirms our underlying beliefs, or because the models ended up being right (disregarding all the other wrong models). Or we just hand-wave things - like, Figure 5 in that paper? That is a storm of dots, as the French poetically put it. I'm not going to even discuss the Xbox paper. Basically, this seems like an excuse for more fancy math backed up by more hand-waving.

Now excuse me while I continue writing my paper that uses post-stratification with health data. ¯\_(ツ)_/¯ When all you have is a hammer...
posted by quadrilaterals at 2:02 PM on June 12 [1 favorite]

Or we just hand-wave things - like, Figure 5 in that paper? That is a storm of dots, as the French poetically put it.

A great turn of phrase, but the figure shows the relationship between liberal/conservative policy preferences and shares of city revenue from sales tax. Is it really handwaving to note that a city's general tax policy does not perfectly line up 100% with the policy preferences of its' citizens? Is there anywhere that government policies perfectly match the policy preferences of the citizens? (Not to mention: city A and B both have the same sales tax share of revenue; city A passes a gasoline sales tax to fund transit/walk/bike investments, city B institutes a new user fee for all public parks. Now city A has a higher share of revenue from sales tax and B has a lower share -- does city A really sound more conservative than city B?)

That said, it's not the strongest endorsement that the model does a good job of replicating preferences -- although I can't think of a better validation of the model, since there is precious little non-anecdotal data about the policy preferences of cities.
posted by Homeboy Trouble at 4:43 PM on June 12

I'm not stats expert, or even particularly knowledgable, but the basic idea sounds very similar to Nate Silver's PECOTA projection system and similarly-situated baseball projection systems.

The twist is that sabermetricians have completely deanonymized data -- not only can they see that left-handed relievers with a certain BABIP trajectory tend to be late bloomers with a long career, they in most cases have data on every single play on every single player going back decades.

In other words, pollsters can make projections of the 3,000 straight white college-educated suburban-dwelling females in a certain congressional district, and do it my constructing a melange of the total votes and poll results of similarly-situated *groups* of votes. In contrast, the sabermetrician looks at Bill Pecota and can estimate his future performance based on a melange of every player in history who had even remotely similar attributes to his career.

The poststratification part in the baseball world is not the prediction of district totals, but season win totals. In other words, you use the model you've built of each player to determine how they will fare against the teams they will face in the actual season, taking into account things like park adjustments (indicative prevailing wind direction and outfield wall configuration, etc) at their home park and the parks of division rivals, where they will play a disproportionate number of innings, etc.

TL;DR Nate Silver is right: predicting political outcomes is a lot like predicting baseball outcomes.
posted by LiteOpera at 6:05 AM on June 13

Is it really handwaving to note that a city's general tax policy does not perfectly line up 100% with the policy preferences of its' citizens? (re: fig 5 in Tausanovitch)

They're trying to show the opposite - that policy preferences can predict sales tax revenues (preferences lead to policy which leads to revenues). An R of 0.34, regardless of its significance, is not something to brag about, particularly when your underlying distribution looks nearly completely random. You can draw a lot of lines through that cloud, and I'm not sure any of them tell you much.
posted by quadrilaterals at 7:30 AM on June 13

« Older A slug, a dandelion, a camera   |   The Most Hated Online Advertising Techniques Newer »

This thread has been archived and is closed to new comments