Tiny groups, big weights
October 12, 2016 11:03 AM   Subscribe

Interesting piece from the NYT, on how one 19 year-old Illinois man is distorting national polling averages: "If you trim the weights, your sample will be biased — it might not include enough of the voters who tend to be underrepresented. If you don’t trim the weights, a few heavily weighted respondents could have the power to sway the survey."
posted by stillmoving (19 comments total) 29 users marked this as a favorite
 
That's really interesting! Thanks for sharing.
posted by iminurmefi at 11:30 AM on October 12, 2016 [1 favorite]


This is a great piece with a unique insight into how polls can be manipulated -- sometimes even with the best of intentions. Kudos to the LA Times for providing such a wealth of detail about their cohort and methodology, permitting this kind of data wonkery.
posted by Lame_username at 11:31 AM on October 12, 2016 [3 favorites]


This would be gold to an AP Stats teacher.
posted by A Priest at 11:49 AM on October 12, 2016 [4 favorites]


Weighting is a real and interesting problem in survey data - you face the difficult choice between underrepresenting key groups and overrepresenting some individual observations, which will due to some ineffable law of the universe be a really weird observation. This is why I always disagree of people who criticize polling and forecasting using some sort of "purity" benchmark - there needs to be a bit of art in with the science. (I'm totally going to use the 19 year old black Trump voter as an example on the call I'm making in an few minutes about survey weighting.)

One challenge not noted in the NYT article is that the sample is apparently derived from all eligible members of a household, which will tend to confound estimates of sample size - the default assumption of sampling mathematics and error term calculations is that the samples are independently obtained. This is obviously not true in the case of sampling two people from the same household. If you sample a random man and a random woman from the whole population of married couples (e.g. Mr. Smith and Mrs. Garcia), you have almost the same chance that they're same-party (both Republicans or both Democrats) as the chance that they are cross-party (ie that one is R and one is D). If you sample the two people from a given married couple (e.g. Mr and Mrs. Garcia) they are six times more likely to be the same party rather than cross-party. (86% to 14%)

This will exacerbate their weighting issue, since couples (or multiple people in the same household) are also more likely to be similar in terns of their weighting variables like age, education level and ethnicity - and by definition live in the same geographic location and have the same household income. If the 19 year old black Trump voter moves in with his girlfriend, she's more likely to be under 21 than 45-54, more likely to be black and more likely to support Trump than another person selected at random.

If you liked this article, there's another really good one about polling and data from Nate Cohn, who's been doing really great work this election - We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results. Wherein, as the headline implies, the same 867 person survey responses produced results ranging from Trump +1 to Clinton +4 based on the internal models that the pollsters used for weighting.
posted by Homeboy Trouble at 11:53 AM on October 12, 2016 [12 favorites]


Interesting piece from the NYT, on how one 19 year-old Illinois man is distorting national polling averages

One point of clarification: this is referring to "a panelist on the U.S.C. Dornsife/Los Angeles Times Daybreak poll, which has emerged as the biggest polling outlier of the presidential campaign" - one person on one outlier poll, so if this poll is being used in National averages, it's skewing those poll averages. But if you exclude this poll, you won't see the same skewing.
posted by filthy light thief at 12:01 PM on October 12, 2016 [1 favorite]


Homeboy Trouble: "Weighting is a real and interesting problem in survey data - you face the difficult choice between underrepresenting key groups and overrepresenting some individual observations"

Here's what I don't get – clearly this weighting should have produced yuuuge error bars. Did their model seriously not consider uncertainty of any kind, or provide some other hint that the small sample size would propagate into such a huge error?
posted by schmod at 12:07 PM on October 12, 2016 [2 favorites]


On r/politicaldiscussion, a number of users have been following how this individual has been responding for weeks. They've taken to calling him Carlton, as a nod to the conservative character from Fresh Prince.
posted by the_querulous_night at 12:48 PM on October 12, 2016 [12 favorites]


Carlton.

Man, sometimes the Internet is all right.
posted by Itaxpica at 12:53 PM on October 12, 2016 [4 favorites]


This takes me back to the misdirected trolling glee I had at 19 at being a female Republican in northern Illinois.
posted by melissam at 12:58 PM on October 12, 2016 [1 favorite]


Big thanks to stillmoving for the link to the article, because it exposes what I've never respected about use of weights in any kind of survey; pollsters use the weights as known anchored values, with no uncertainty around them. Just like the uncertainty involved with sampling -- shoutout to Homeboy Trouble for his example above -- the use of weights has its own uncertainty. These uncertainties build on each other which necessarily expands the end margin of error. Instead, the reported errors center on the 'n' involved and not including those introduced by whatever weighting scheme.

Personally, I care little for topline percentages. Show me some kind of multinomial logistic choice-based regression of candidate as a function of whatever collected independent variables, such as demographics, and I would piddle myself with glee. I prefer to know the drivers of choice, not a constructed 42%-40%-7%-11% summary of it.
posted by Conway at 1:27 PM on October 12, 2016 [2 favorites]


Ha, I was just having a nerdfight about this pollster with some other nerdy friends yesterday. My biggest problem with these polls is not the sampling (which is always kind of messy when you get down to the crosstabs), but the question they use to project vote percentages - they don't ask "who are you going to vote for" but "How likely are you to vote for Clinton/Trump/etc.?"

This is intended to deal with the problem that polls sometimes have, which is pushing people to choose one candidate and thus making them seem more decided than they are. But this seems like an overcorrection, because when it comes down to it, each voter can only choose one candidate. It's not like someone who gives Trump a 90 has 50% more of a vote than someone who gives Clinton a 60, but this is how those responses are reflected, as far as I can tell from their FAQ.
posted by lunasol at 1:38 PM on October 12, 2016 [1 favorite]


Also, another really annoying thing about their FAQ on the LA times site is that they cite their success at predicting the results in 2012 as proof the model works, but this is a pretty radical shift in polling methodology and you can't really validate it with one result.
posted by lunasol at 1:40 PM on October 12, 2016


It's not like someone who gives Trump a 90 has 50% more of a vote than someone who gives Clinton a 60, but this is how those responses are reflected, as far as I can tell from their FAQ.
Sure, but I'm okay with calling those .9 votes for Trump and .6 votes for Clinton, in which case the Trump votes do count 50% more.

The way I'd think of it is that if instead of taking 1 person giving Trump a 90 and 1 person giving Clinton a 60, we take ten of each of them, then we'd expect 9 of the 10 Trump-90-ers to vote for Trump and 6 of the 10 Clinton-60-ers to vote for Clinton, so he'd be ahead 9 votes to 6. But we have to divide that by ten to reflect the actual number of respondents.

Of course this only works if a response of 90 really means they are 90% likely to vote for Trump and 10% likely to vote for no one.
posted by dfan at 2:01 PM on October 12, 2016 [2 favorites]


It would be hard to come up with a category of voter less likely to vote for Trump -- Illinois (Trump has a 33% vote share in poll averages); African American (Trump has something like 1% to 6%); and 19 years old (Trump has something like 20% of 18-29 year-olds). People of course don't always comport to their sub-groups, but I can't help but wonder if this guy was just having fun with the pollster.
posted by Mid at 2:12 PM on October 12, 2016 [1 favorite]


@mid -- the top voted comment on the NYT site also suggests something along those lines, pointing out that this individual also reports an income of over 75k/year, which is quite unusual for a 19-year-old.
posted by phoenixy at 6:27 PM on October 12, 2016 [1 favorite]


Great article.

Here's what I don't get – clearly this weighting should have produced yuuuge error bars. Did their model seriously not consider uncertainty of any kind, or provide some other hint that the small sample size would propagate into such a huge error?

Errors in the weighting should be random and so average out. And, in fact, if you changed the poll group every time you wouldn't have seen the systematic bias. As the article says, you might see more volatility (which is error, for sure) but you'd also average closer to the "truth" over an unweighted sample. To a large extent it's bad luck--there was maybe one chance in 20 of getting a Republican young African American male. In that analysis it's sticking with the one group that is really causing the errors.

It's really a challenging situation I think. So few people answer polls now that I suspect weighting is very important to have any chance of predicting anything. Otherwise the bias in your sample far outweighs the error added by the weighting.

My layperson's take is the article does a great job explaining what's wrong with the LA Times poll and a few of the other challenges, but there are a lot of other ways you can be wrong as well if you don't try this stuff. Which is why everyone does it, albeit in a much less extreme way.
posted by mark k at 6:29 PM on October 12, 2016 [1 favorite]


Just saw Gelman has a post on this; in the update he actually quotes one of the pollsters on an issue relevant to the uncertainty:
The result is that a few individuals from groups such as those who are less represented in polling samples and thus have higher weighting factors, can shift the subgroup graphs when they participate. However, they contribute to an unbiased (but possibly noisier) estimate of the outcomes for the overall population.

Our confidence intervals (the grey zones) take into account the effect of weights. So if someone with a big weight participates the confidence interval tends to go up.
posted by mark k at 6:48 PM on October 12, 2016 [2 favorites]


Errors in the weights are probably not a huge source of bias, but every political poll will still be biased to some degree. Even with perfect knowledge of response rate among some demographic group vs. the fraction of voters in that group, being reachable (via land line?), willing to respond, and responding truthfully may all be correlated with political preference among those in the group. And the only source of truth for estimating this bias is past election results, which only give you geographic breakdowns that mix the demographic factors.
posted by mubba at 7:47 AM on October 13, 2016 [1 favorite]


It's not like someone who gives Trump a 90 has 50% more of a vote than someone who gives Clinton a 60, but this is how those responses are reflected, as far as I can tell from their FAQ.

The other thing is that they won't necessarily have made up their minds by election day. And if they still go out and vote, they're going to have to make a choice through that uncertainty.

I think I'd prefer it if the poll was trying to answer the question "if election day was today, how would you vote?" Add in a "stay home" option. The poll taken the day before election day should basically predict the outcome. Whereas a poll allowing people to express their degree of indecision may be way off even on election day itself.
posted by mantecol at 8:30 AM on October 14, 2016


« Older Final Thoughts on Tomi Lahren   |   I Was Pregnant, And Then I Wasn’t Newer »


This thread has been archived and is closed to new comments