42.7 percent of all statistics are made up: After Strategic Visions refused to share the methodology behind some of their polling, Nate Silver of fivethirtyeight analyzed the firm's poling results and found evidence of fraud. Strategic Visions responds to The Hill. More amusingly, Nate went on a look at an even more questionable study by the same company claiming that only 23 percent of Oklahoma students know that George Washington is the first president.

For good measure, Silver compared the Strategic Visions data with a control set. Continuing~~research~~ obsessive inquisition revealed that the Atlanta address on Strategic Visions's website is not the company's actual address.

research obsessive inquisition revealed that the Atlanta address on Strategic Visions's website is not the company's actual address.

It's a statistics firm called Strategic Visions. That alone is pretty hinky.

*Johnson said the series of events has caused a panic at his firm and that employees are fearful for their safety. He said they’ve been getting harassing text messages, and that their receptionist was verbally accosted in their parking lot by someone who referenced Silver’s blog.*

If this is ever proven true I will disable my account and eat a ten pound bag of hot fresh horseshit.

Agreed, Sys Rq. It might as well be called "The Results You Want®"

If there is one group with a proven history of violence, it is bloggers who like to read about esoteric statistics.

In the first analysis, Silver assumes a uniform distribution of last digits and then shows that the digits are, in fact, not uniform. I don't understand why Benford's Law, which states that leading digits are

not uniform, couldn't also apply to last digits. Anybody know?

Also, the title is wrong. Silver has

not accused them of fraud yet. He's shown that there's something that needs explaining and that one of the explanations could well be fraud.

twoleftfeet>

Benford's law

*In the first analysis, Silver assumes a uniform distribution of last digits and then shows that the digits are, in fact, not uniform. I don't understand why Benford's Law, which states that leading digits are not uniform, couldn't also apply to last digits. Anybody know?*Benford's law

*does*apply to the last digit, but the distribution isn't the same as the log(1 + 1/d) of the first digit. Once you're all the way to the last digit, the distribution is going to be pretty close to random. I refer you to the wikipedia page, specifically to the section titled "Generalization to digits beyond the first".posted by UrineSoakedRube at 7:13 PM on September 27, 2009 [2 favorites]

I have very high respect for Nate Silver's statistical chops, but he hasn't yet convinced me that we're looking at compelling evidence of fraud here. Here's my blog post about it.

Executive summary: it's hard to draw trustworthy information about fraud from data like this unless you have a pretty solid picture of what the data is supposed to look like in the absence of fraud.

Mark Blumenthal at Pollster has a similarly cautious take.

Executive summary: it's hard to draw trustworthy information about fraud from data like this unless you have a pretty solid picture of what the data is supposed to look like in the absence of fraud.

Mark Blumenthal at Pollster has a similarly cautious take.

Benford's law says that the leading digit is 1 about 30% of the time; this has to do with our numbering system (for instance, from the numbers 1 - 20, 1 is the leading digit on 11 of them). In other words, you'd expect a log distribution of the leading digit, but an even distribution of the final digit.

The whole thing points up to the critical need for higher-quality sources of entropy when faking data.

The whole thing points up to the critical need for higher-quality sources of entropy when faking data.

*If there is one group with a proven history of violence, it is bloggers who like to read about esoteric statistics.*

I'm sorry, but WHAT THE FUCK have fantasy baseball players done to you??

Oh, you mean.... Oh.

Once again, the terrifying Liberal Statistician Stormtroopers strike fear into the hearts of god-fearing Americans.

*WHAT THE FUCK have fantasy baseball players done to you?*

For the record, that's also Nate Silver's fault.

*Who will save us from the terror!?*

Disaster-relief supplies inside. In case of emergency, solve for

*q*.

An important point to note is that the questionable pollster is Strategic Vision, LLC, not Strategic Vision, Inc. which has been around for 37 years and is considered a reputable pollster.

I knew something was up with that Oklahoma student poll. Being a graduate of Oklahoma public high schools, I know many of them aren't great, but just counting the Honor's Societies, IB/AP students, Academic Bowl, etc. members would throw these poll results off.

And of course I make a typo trying to prove them wrong. Such is life....

If you took all the college students who fell asleep during lectures and laid them end to end, they'd be a lot more comfortable.

*If this is ever proven true I will disable my account and eat a ten pound bag of hot fresh horseshit.*

Clever way to not have to do a follow-up.

are cooking the books. A deeper look at who all has hired them for polling will be interesting.

I am reminded too, of something I've been meaning to ask some statistics folks – besides Nate Silver's stuff, what are some other good statistics-type blogs that do these sorts of analyses?

**are**cooking the books. A deeper look at who all has hired them for polling will be interesting.I am reminded too, of something I've been meaning to ask some statistics folks – besides Nate Silver's stuff, what are some other good statistics-type blogs that do these sorts of analyses?

Also, do people ever really believe statistics? They say 78.5 percent don't.

*An important point to note is that the questionable pollster is Strategic Vision, LLC, not Strategic Vision, Inc. which has been around for 37 years and is considered a reputable pollster.*

But Strategic Vision, LLC has a .biz address! That means they are a reputable business!

David Johnson, CEO, has a twitter account. It's like reading the transcript of an ELIZA chatbot with the "reflexive Republican spin" dial twisted way to the right.

*besides Nate Silver's stuff, what are some other good statistics-type blogs that do these sorts of analyses?*

Pollster is a great group blog on election polling, featuring lots of people with years of experience in the field.

For general statistics blogging -- sometimes technical, but often not -- you can't beat Columbia statistics professor Andrew Gelman's Statistical Modeling, Causal Inference, and Social Science.

Speaking of frauds, I view Washington's presidency as something of a scam. Come on, the guy was an independent yet got in 100% of the electoral college backing him? And he had the gall to get everyone drunk on Barbados Rum at his inauguration, so they'd forget the details later! President Washington? Psh! More like King George the 1st!

*besides Nate Silver's stuff, what are some other good statistics-type blogs that do these sorts of analyses?*

Well, the Australian equivalent of 538 is The Poll Bludger, although he looks into US elections and polling when not much is happening down under.

It occurs to me that an interesting way to test this further would to translate the numbers into other number systems (binary, ternary etc. etc.), repeat the analysis, and calculate a RMS error for each number system. The error should peak quite clearly, one would assume, in the decimal system.

*Once you're all the way to the last digit, the distribution is going to be pretty close to random.*

But if I'm correctly reading this sentence from the first article -

*Silver tested the last digit in every percentage polled in all of Strategic Vision’s surveys, which he suggested should be evenly distributed among all 10 digits*- then he's only got two digits most of the time (if they are digits in a percentage.) So not yet uniform, right?

It isn't the analysis of the numbers that proves it, its the refusal to turn over the data.

Always, always, always . . .

its the coverup, not the crime.

Because once they decide to coverup, every move to hide something just shows its outline. If I have a bank account I didn't want anyone to know about, the transfer out of that account is traceable. If I want to hide that my polling numbers are rigged, I put numbers I think are "random" in there. You know where the money is and that the books are cooked by the very moves that are made. The more things that have to be hidden, the more moves they have to make and the more you can know about the thing they are trying to hide.

This is how karma works.

Always, always, always . . .

its the coverup, not the crime.

Because once they decide to coverup, every move to hide something just shows its outline. If I have a bank account I didn't want anyone to know about, the transfer out of that account is traceable. If I want to hide that my polling numbers are rigged, I put numbers I think are "random" in there. You know where the money is and that the books are cooked by the very moves that are made. The more things that have to be hidden, the more moves they have to make and the more you can know about the thing they are trying to hide.

This is how karma works.

twoleftfeet>

True, but you're not going to see the prevalence of 8's that Silver saw in Strategic Vision LLC's data:

Silver tested the last digit in every percentage polled in all of Strategic Vision's surveys, which he suggested should be evenly distributed among all 10 digits. But he found that, for instance, '8' appeared at the end of a number in 60 percent more numbers than '1.' He said it was "an incredible fluke – millions to one against."

The suggestion, made patently but couched as to avoid legal trouble, is that someone invented the numbers and, for whatever reason, used '8' as a last digit in an inordinate amount of numbers.

But he found that, for instance, ‘8’ appeared at the end of a number in 60 percent more numbers than ‘1.’ He said it was “an incredible fluke – millions to one against.”

The suggestion, made patently but couched as to avoid legal trouble, is that someone invented the numbers and, for whatever reason, used ‘8’ as a last digit in an inordinate amount of numbers.

*David Johnson, CEO, has a twitter account. It's like reading the transcript of an ELIZA chatbot with the "reflexive Republican spin" dial twisted way to the right.*

I started reading his twitter account and all of a sudden my chest tightened and I started grinding my teeth. I haven't felt like this since the Bush administration.

Benford's Law only applies to stats with a power law distribution. There is no particular reason to think that it applies here, especially in trailing digits.

That is a terrible example, and has nothing to do with Benford's law. If you choose the numbers 1-100, 1 is the leading digit on 12 of them. Benford's law is based on the fact that numbers obeying a power law cluster more tightly in the low ranges than the high ranges.

That is a terrible example, and has nothing to do with Benford's law. If you choose the numbers 1-100, 1 is the leading digit on 12 of them. Benford's law is based on the fact that numbers obeying a power law cluster more tightly in the low ranges than the high ranges.

My interpretation: I expected the last digits to be completely random, or to at least have an explainable bias.

It's a meta-study, so maybe it's looking at responses to political questions; and, well, I guess some questions might only give two options, and some might give a free-for-all response that was boiled down to fitting up to, oh, maybe eight or ten categories.

Now a questionnaire that asks me to choose from ten choices is just inane. Most questions are going to code into somewhere between 2 and four or five options or categories. So if we do a meta-study of the distribution of last digits, I guess I don't expect the distribution to be completely random.

I guess I'd expect to see a rise toward the left. I guess I should expect any meta-study to, in raw data form, to show bias for 1—3, and against 7—10. If you're learning why Nate is right like I am, then I think you'll be pleased as I was to see that Benford's Law looks just like I might have expected it to: this sloped distribution of final digits. Well,

There's an unintentional political jab there. I'm sorry. Again, it is true: reality biases left.

It's a meta-study, so maybe it's looking at responses to political questions; and, well, I guess some questions might only give two options, and some might give a free-for-all response that was boiled down to fitting up to, oh, maybe eight or ten categories.

Now a questionnaire that asks me to choose from ten choices is just inane. Most questions are going to code into somewhere between 2 and four or five options or categories. So if we do a meta-study of the distribution of last digits, I guess I don't expect the distribution to be completely random.

I guess I'd expect to see a rise toward the left. I guess I should expect any meta-study to, in raw data form, to show bias for 1—3, and against 7—10. If you're learning why Nate is right like I am, then I think you'll be pleased as I was to see that Benford's Law looks just like I might have expected it to: this sloped distribution of final digits. Well,

duh. We figured that out pretty much ourselves. Let's call it Our "You're Gonna Slope to the Left" Rule. There's an unintentional political jab there. I'm sorry. Again, it is true: reality biases left.

Of course, I could be entirely out to lunch. Can a person who actually Knows Things confirm that what I seem to figure would be shown in looking at last digits, is what happens? I was surprised to see how smooth the distribution is, but it seems to make common sense.

I like what Nate is doing here, but I don't understand why he uses both trailing digits in a given poll. In his example, if a poll said had Obama 58, Clinton 32, he'd put both 2 and 8 into his dataset. Since the two options need to add up to 100 (less undecideds, which are up to a few percent depending). The data you get now is not going to be independent, since if you have that 2, you

If he would just pick one of the two results per poll (chosen at random somehow, perhaps, though it shouldn't matter) he would have a much nicer looking data set that won't have uncontrolled correlations in it. And he'd still have 2500 or so points, which would be more than sufficient to do a more sophisticated analysis than he could here.

have to have an 8 for the last digit to sum up correctly. Or maybe if there are a few undecideds, you'll get a 2 and a 7. Either way, if one digit is low, then the other digit will necessarily be high because of this and you see that in his comparison to Quinnipiac where the low numbers skew one way and the high numbers skew the other. You also see it to a much lesser degree in the Quinnipiac data. If he would just pick one of the two results per poll (chosen at random somehow, perhaps, though it shouldn't matter) he would have a much nicer looking data set that won't have uncontrolled correlations in it. And he'd still have 2500 or so points, which would be more than sufficient to do a more sophisticated analysis than he could here.

Um, let's say Obama 58 and Clinton 42 in that example. I'll now slink off and hope that gaffe doesn't detract from the rest of my point.

*Benford's law says that the leading digit is 1 about 30% of the time; this has to do with our numbering system (for instance, from the numbers 1 - 20, 1 is the leading digit on 11 of them).*

And of the numbers 20-99, 1 is the leading digit of

**none**of them. That has nothing to do with Benford's law.

*If you took all the college students who fell asleep during lectures and laid them*

...you'd be tired and sore.

*Also, the title is wrong. Silver has not accused them of fraud yet. He's shown that there's something that needs explaining and that one of the explanations could well be fraud.*

Um, yes he has:

It sounds like the case with the student knowledge poll was far more obviously fake.It seems quite strongly possible, nevertheless, that the students polled for this survey don't exist anywhere in Oklahoma but instead on a hard drive somewhere in Atlanta. This is a valuable exercise undertaken by the OCPA. But they owe it to the hardworking students of Oklahoma to make sure that their contractor, Strategic Vision, didn't flunk its own citizenship test.

After more reading, I think I was probably laughably wrong.

Math is hard!

Math is hard!

*It sounds like the case with the student knowledge poll was far more obviously fake.*

Well, yeah. 0% of the 1000 surveyed students got more than 7 of the questions right? As a product of Oklahoma public schools, that's well-nigh impossible IF the 1000 students were selected randomly.

With the caveat that I graduated from high school almost 20 years ago, I just can't see how you could not find a fraction of kids who would know all 10 answers. It would essentially require every student to sleep through every civics, history, and social studies class from elementary school on. It would also essentially require a lack of smart kids in Oklahoma -- and yet there were 36 National Merit Scholars in Tulsa area schools this year, which represent what, 1% of the "smart kids" in metro Tulsa? That's 3600 kids -- and that's roughly 20% of the metro Tulsa high school population.

I don't see how they can come up with this without either running into some version of the "cell phone problem" (i.e. not being able to reach the smart white kids because they didn't have cell numbers, instead reaching poor kids on land lines), or they massaged the numbers something crazy. Did they only call poor black neighborhoods in north Tulsa while avoiding rich white neighborhoods in south Tulsa and the suburbs?

This is either the crappiest survey ever run, or it's intentionally skewed to favor their clients, or these 1000 high school students never existed.

*If this is ever proven true I will disable my account and eat a ten pound bag of hot fresh horseshit.*-- posted by Optimus Chyme

The problem with most MeFi detective threads is that there's no reward at the end.

This could be better.

*Strategic Vision, LLC, not Strategic Vision, Inc. which has been around for 37 years and is considered a reputable pollster.*

And they're even in the same industry?

Wow, someone really needs a trademark lawyer.

*But he found that, for instance, ‘8’ appeared at the end of a number in 60 percent more numbers than ‘1.’ He said it was “an incredible fluke – millions to one against.”*

Maybe Strategic Vision only polls Chinese Americans?

The number 8 of course stands for either Heil or Hitler.

Should the first link in the post not be to this?

Should the first link in the post not be to this?

Damnit, thank you. Contact forming now.

Damnit, thank you. Contact forming now.

*That is a terrible example, and has nothing to do with Benford's law. If you choose the numbers 1-100, 1 is the leading digit on 12 of them. Benford's law is based on the fact that numbers obeying a power law cluster more tightly in the low ranges than the high ranges.*

House numbering and finite street sizes is the best way to describe Benford's Law to a n00b. In my humble opinion.

*Nate Silver of fivethirtyeight analyzed the firm's poling results and found evidence of fraud.*

What a pack of punts.

Can anyone vouch for Silver's method given Schismatic's concern? I don't know where he's getting his ideas on trailing digits. There've got to be polling staticians in Metafilter somewhere. The student knowledge one seems to be a slam dunk. But I see no reason why the trailing digits should be random. I mean, it seems very "common sense," and may be what Silver usually sees, but those are pretty weak by a mathematician's standard.

Mark Twain (or whoever):

Another favorite of mine: "Studies have shown" ... that phrase means

People who get

Let alone the owner of Clever Hans.

**"There are three kinds of lies: lies, damned lies, and statistics."**Another favorite of mine: "Studies have shown" ... that phrase means

**bupkiss**.**"Argument from authority or appeal to authority is a logical fallacy**... The fallacy only arises when it is claimed or implied that the authority is infallible in principle and can hence be exempted from criticism."People who get

__paid__to gather data to generate statistics or make studies are seldom infallible in principle. Everyone who does "studies" is liable to be biased... even people with the best intentions and impeccable credentials.Let alone the owner of Clever Hans.

*But I see no reason why the trailing digits should be random.*

I know is that a similar metastudy was performed on scientific papers (thousands of them), and the results of that study showed a trend towards some digits appearing more often than they should, which many found concerning.

Anyway, think of this. Those numbers that are reported by the pollster are the result of sampling, and have been rounded to only a few digits. The reality, if you take

**census**of an entire population of approximately 100,000 people, might come out something like this:

Obama: 51831

Clinton: 41880

Or maybe:

Obama: 47032

Clinton: 39617

Or maybe even:

Obama: 60048

Clinton: 34783

In this case, we can see that the most significant digit, marking tens-of-thousands, is very likely to be, say, a 5 a 4 or a 3. Because the population is close to evenly divided. Maybe there'll be a 6 in there. Very unlikely to start with a 9 or a 1, because there is some external factor that's making a real difference to this digit. On the other hand, the least significant digit on the right-hand side - what's driving this? Nothing. Chaos. We can expect these digits to be random. The same should be true if you do a sampling from this population, and convert it into percentages.

**Unless**those numbers aren't the result of chaos, but are the result some other process, for example, a human making them up. For example, here's a histogram of the distribution of all the digits I just made up above. Clearly my brain isn't very random, and I like 8s and 0s quite a bit.

Okay that's probably not very good by a mathematician's standard. But that's why statistics is mathematics' weird, demented, twisted uncle.

The concerns with Nate's method are fair, though - pairing the 8's with 2's doesn't make sense in 2-horse races although selecting one of them at random instead shouldn't make a difference to the final distribution you get. The main problem, I feel, is the lack of precision in the poll reporting - as others have said, you really need to do this on numbers like 48.31, rather than straight 48, but that's down to the data he had available to him, I assume.

Oooh man, my histogram is above is screwed, because for some reason R put "0" and "1" into the same bin. Curse you. Here's the proper one.

barnacles -

The BBC and Open University* produce a radio programme called "More or less: behind the stats" which basically exists to examine statistics and figures seen in that week's news, then investigate where they came from and whether they actually stand up to analysis (SPOILER: almost never).

Entertainingly presented and, as someone who barely has the maths skills to type in a phone number, I find the stats explanations to be very clear. A trained statistician might find it a bit simplistic, but I think it's great. Podcast, Website/archive.

*a venerable and fairly well-respected correspondance / distance learning univeristy in the UK.

*besides Nate Silver's stuff, what are some other good statistics-type blogs that do these sorts of analyses?*The BBC and Open University* produce a radio programme called "More or less: behind the stats" which basically exists to examine statistics and figures seen in that week's news, then investigate where they came from and whether they actually stand up to analysis (SPOILER: almost never).

Entertainingly presented and, as someone who barely has the maths skills to type in a phone number, I find the stats explanations to be very clear. A trained statistician might find it a bit simplistic, but I think it's great. Podcast, Website/archive.

*a venerable and fairly well-respected correspondance / distance learning univeristy in the UK.

*On the other hand, the least significant digit on the right-hand side - what's driving this? Nothing. Chaos. We can expect these digits to be random.*

That's what I'm questioning, and I haven't seen anything but assertion that it should be so.

I was finally able to read the full page at Silver's blog, and a lot of his knowledgeable commenters pointed out this problem. There are possibly multiple colinearities in those numbers. They are dependent on the total population size, the number of options in the given question, and the denominator used for percentages. They could also dependent on the types of questions and regions they poll (since I would imagine 40 vs 60 questions are more common than 10 vs 90).

I think it's possible, I just haven't seen any halfway decent proof of such.

*I like what Nate is doing here, but I don't understand why he uses both trailing digits in a given poll. In his example, if a poll said had Obama 58, Clinton 32, he'd put both 2 and 8 into his dataset. Since the two options need to add up to 100 (less undecideds, which are up to a few percent depending). The data you get now is not going to be independent, since if you have that 2, you have to have an 8 for the last digit to sum up correctly. Or maybe if there are a few undecideds, you'll get a 2 and a 7.*

Except that in political horserace polls, your undecideds are much higher than 1%. Check out this sampling of polls from the Virginia governor's race: the "undecided" number is anywhere from 2% to 20%. There's no reason to pair up 8's and 2's, or 7's and 3's, or any other numbers, because they won't be paired up more frequently than any other trailing digits.

*House numbering and finite street sizes is the best way to describe Benford's Law to a n00b. In my humble opinion.*

Again this has nothing to do with Benford's Law. House numbers are linearly distributed but Benford's law applies to stats obeying a power series. In other words, it applies to a series N if log(N) is linearly distributed. For example, city sizes.

Wikipedia has a clear explanation:

*Except that in political horserace polls, your undecideds are much higher than 1%. Check out this sampling of polls from the Virginia governor's race: the "undecided" number is anywhere from 2% to 20%. There's no reason to pair up 8's and 2's, or 7's and 3's, or any other numbers, because they won't be paired up more frequently than any other trailing digits.*

That's a fair point, but I don't think it changes my problem with this too much. The last digits need to add up to 100-%undecideds. Unless the undecideds are uniformly distributed (which seems very unlikely), you would wind up with some sort of correlation in the data such that the last two digits would sum to some numbers more than others. While 8s and 2s may not be paired up, you would almost certainly see more 7 and 1 pairs than 7 and 9 pairs. Like I said, you do see this in the plot of last digit distribution on both Strategic Visions and Quinnipiac, where high numbers and low numbers skew in opposite directions.

I also think that the quantitative approach to falsifying the OK data is looking in the wrong place. The intuitive reason that it would be impossible for not even one student to score higher than 7 is probably the right approach. Extreme values (the highest points in a data set, in this case) behave differently than the average value of the data, and I've never seen a real data set this big without some significant outliers even if it were randomly distributed. The statistics of these should definitely be looked at on their own, because the average values can come from a number of processes and are inherently smoothed over.

Another way to understand Benford's law is to think of it as the logical consequence of the fact that units of measurements are arbitrary. The distribution of 1s, 2s, 3s etc as leading digits should be the same regardless of the particular units we are using. Naively we might suppose that all digits are equally likely to appear as leading digits if we measure in whatever units we want (centimetres, inches, finger widths, bees dicks etc). However it is easy to see that this can't be the case. If it was, then measuring in units half as long, all the 5s, 6s, 7s, 8s and 9s would become 1s (fully 50%) and the rest would be divided amongst the other 9 digits, so the distribution would not be independent of the unit used. It turns out that the only distribution where you can arbitrarily multiply or divide each of the measurements (reflecting different measurement units) and still end up with the same distribution is the power distribution specified by Benford's law.

In the case of surveys I'm not sure why Benford's law should be expected to hold. After all you aren't dealing with measurements that can vary over an arbitrary range but usually percentages that are constrained between 0 and 100. Also you can expect the results to be around the middle of the field, for the simple reason that people design survey questions such that there is significant disagreement between people. "Do you think cannibalism is awesome?" is not a typical question, since it will yield boring results like 1% yes/99% no. More typical questions (e.g. "do you approve of politician X/policy Y?") are designed to yield a split between 20/80 to 50/50.

In the case of surveys I'm not sure why Benford's law should be expected to hold. After all you aren't dealing with measurements that can vary over an arbitrary range but usually percentages that are constrained between 0 and 100. Also you can expect the results to be around the middle of the field, for the simple reason that people design survey questions such that there is significant disagreement between people. "Do you think cannibalism is awesome?" is not a typical question, since it will yield boring results like 1% yes/99% no. More typical questions (e.g. "do you approve of politician X/policy Y?") are designed to yield a split between 20/80 to 50/50.

*Also, the title is wrong. Silver has not accused them of fraud yet. He's shown that there's something that needs explaining and that one of the explanations could well be fraud.*

Um, yes he has:

It seems quite strongly possible, nevertheless, that the students polled for this survey don't exist anywhere in Oklahoma but instead on a hard drive somewhere in Atlanta. This is a valuable exercise undertaken by the OCPA. But they owe it to the hardworking students of Oklahoma to make sure that their contractor, Strategic Vision, didn't flunk its own citizenship test.

Um, yes he has:

It seems quite strongly possible, nevertheless, that the students polled for this survey don't exist anywhere in Oklahoma but instead on a hard drive somewhere in Atlanta. This is a valuable exercise undertaken by the OCPA. But they owe it to the hardworking students of Oklahoma to make sure that their contractor, Strategic Vision, didn't flunk its own citizenship test.

Just wanted to point out that "It seems quite strongly possible" does not an accusation of fraud make. An accusation of fraud reads like this: "THEY COMMITTED FRAUD."

Thank you for posting this. I don't know anything about statistical math but that blog entry about the OK High School students is the most fascinating thing I've read today. The comments add some pretty important points-- such as if the poll was open-ended why didn't a single Oklahoma student answer the question, "What is the law of the land?" with "The Ten Commandments."

Unfortunately the use of false statistics to muddy the waters serves a twofold purpose-- it gives the underlying agenda more ammunition and at the same time it weakens the impact of real statistical analysis. Which political party in America needs to be propped up by fake numbers?

Unfortunately there isn't much that can be done to counter these made-up statistics, particularly in these days of psuedo-journalism. FOX news and USA Today will run with the shocking "Only 23% of Oklahoma High School students know that George Washington was our first President!" and there won't be any investigation behind the numbers presented. Viewers and readers won't question the numbers because "It was on the TV news/ In the newspaper." The memes "public High Schools are a failure" and "kids today are stupid" will once again be re-inforced.

Unfortunately the use of false statistics to muddy the waters serves a twofold purpose-- it gives the underlying agenda more ammunition and at the same time it weakens the impact of real statistical analysis. Which political party in America needs to be propped up by fake numbers?

Unfortunately there isn't much that can be done to counter these made-up statistics, particularly in these days of psuedo-journalism. FOX news and USA Today will run with the shocking "Only 23% of Oklahoma High School students know that George Washington was our first President!" and there won't be any investigation behind the numbers presented. Viewers and readers won't question the numbers because "It was on the TV news/ In the newspaper." The memes "public High Schools are a failure" and "kids today are stupid" will once again be re-inforced.

The Oklahoma student survey is 110% phony. Scroll down to the bar graphs. They show the various responses the students gave. For example, "What are the two major political parties in the United States?" 43%: Democrat and Republican; 11%, Communist and Republican; 46% Don't Know. There were no "Other Answers" column.

Or, "Who Wrote the Declaration of Independence?" 24% Abraham Lincoln; 19% George Washington; 14% Thomas Jefferson; 7% Barack Obama; 2% Michael Jackson; 34% Don't Know. Again, in spite of wild alternative answers, no "Other Answers" column. This is supposedly 1000 students surveyed. While you can find a couple of kids who might snark the poll and claim Michael Jackson, you would not find 20 kids who would all come up with the same ridiculous answer - and not come up with another equally snarky answer (or one more relevant to their generation).

This isn't just making something up. This is doing it half-assedly.

Or, "Who Wrote the Declaration of Independence?" 24% Abraham Lincoln; 19% George Washington; 14% Thomas Jefferson; 7% Barack Obama; 2% Michael Jackson; 34% Don't Know. Again, in spite of wild alternative answers, no "Other Answers" column. This is supposedly 1000 students surveyed. While you can find a couple of kids who might snark the poll and claim Michael Jackson, you would not find 20 kids who would all come up with the same ridiculous answer - and not come up with another equally snarky answer (or one more relevant to their generation).

This isn't just making something up. This is doing it half-assedly.

This isn't just making something up. This is doing it half-assedly.

*An important point to note is that the questionable pollster is Strategic Vision, LLC, not Strategic Vision, Inc. which has been around for 37 years and is considered a reputable pollster.*

This is the part that REALLY confuses me. How does Strategic Vision Inc. not immediately sue/file injunction on a business that is clearly a competitor in the same industry that is trying to use a blatantly confusing similar name? Christ, Starbuck's doesn't even allow people named Buck to own coffee shops anymore.

Anything with the word "Strategic" in it is usually full of shit.

*For example, "What are the two major political parties in the United States?" 43%: Democrat and Republican;*

And even this is wrong. The name of the party is 'Democratic', not 'Democrat'.

That's liek some special little thing that conservatives do to be dicks. Not entirely sure what they are trying to signify by it other than their own dickishness, but there's probably some deep symbolic meaning if you follow all theforwarded email conspiracy theories.

From the CEO's Twitter feed: "Are we sure that [Obama] is really President? LOL".

I'm pretty sure there was a poll last November that answered that question pretty decisively.

I'm pretty sure there was a poll last November that answered that question pretty decisively.

If you believe the polls, out of 1,000 Oklahoman High Schoolers, none could score higher than 6/10 on a basic citizenship test. Bullshit.

*Mark Twain (or whoever): "There are three kinds of lies: lies, damned lies, and statistics."*

That quote is also used by Jim Galloway at the Atlanta Journal-Constitution in his article yesterday on this brouhaha. As it turns out a commentor at AJC points out that the phrase is attributed to Benjamin Disraeli...a fact that Twain himself pointed out in

Chapters from My Autobiography: "Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: 'There are three kinds of lies: lies, damned lies, and statistics.'"

I love that everyone in this thread wants to give the Oklahoma school children the benefit of the doubt. MeFi threads are so often tilted the other way when it comes to assuming things about red-state smarts.

Couldn't these have been multiple choice questions?

Couldn't these have been multiple choice questions?

I think there's a flaw in Nate's argument about the Oklahoma high school poll. He simulates responses from 50,000 students assuming each one has the same likelihood of getting each question right as the Strategic Visions poll indicates--so 23% for the question about George Washington, 26% for the one about the Bill of Rights, etc.--with the additional assumption that the results for each question are independent of each other, which, as he says, is not a reasonable assumption. The distribution of correct answers he gets matches up with the poll, so he concludes that they must have been generated with that same assumption, but that's not necessarily true.

The marginal distribution of the percentage of people answering each question correctly isn't changed by assuming correlations between the questions. In math lingo, what Nate's done is create a vector of Bernoulli-distributed random variables (1 or 0) and then estimated the average vector by sampling. But by linearity of the expected value, the average vector is just the vector of averages, i.e., the list of input percentages. So he could have assumed the results were perfectly correlated and he still should have gotten the same results.

I agree that there's something fishy about these polls (really, nobody out of 1000 students could get more than 7/10 right?), but this isn't actually evidence of anything. What would be more interesting would be the compare the distributions of

The marginal distribution of the percentage of people answering each question correctly isn't changed by assuming correlations between the questions. In math lingo, what Nate's done is create a vector of Bernoulli-distributed random variables (1 or 0) and then estimated the average vector by sampling. But by linearity of the expected value, the average vector is just the vector of averages, i.e., the list of input percentages. So he could have assumed the results were perfectly correlated and he still should have gotten the same results.

I agree that there's something fishy about these polls (really, nobody out of 1000 students could get more than 7/10 right?), but this isn't actually evidence of anything. What would be more interesting would be the compare the distributions of

Silver is missing out on an important secondary function of the polling and statistics industry. In many published studies, trailing digits are altered by members of secret societies and/or the intelligence community as a way to pass encrypted messages to agents in the field. Should I or should I not try to shoot an elected official today? Well, I go to the Strategic Visions, LLC poll released yesterday and run the big string of 8's and 2's through my decoder ring, and usually - but not always - come up with the answer 'no.'

It's actually a vast improvement on the previous system, which involved a whole dictionary of codewords and pass-phrases that would be woven into obituaries and personal ads. The upsurge of useless numbers in published materials over the last few decades has really made secret public communication far easier.

It's actually a vast improvement on the previous system, which involved a whole dictionary of codewords and pass-phrases that would be woven into obituaries and personal ads. The upsurge of useless numbers in published materials over the last few decades has really made secret public communication far easier.

`Hi David,`

I'm writing you for two reasons.

Firstly, I wanted to make sure that you had some

decent contact information for me. Was really

looking forward to the lawsuit and figured you

might be having trouble getting in touch. I'm

at this e-mail address -- xxxxxxxx@gmail.com,

or at xxx-xxx-xxxx.

Secondly, I wanted to provide you an opportunity

to clear the air about one singular fact. What

call center(s) have you used to conduct your

public-facing polling? For every call center that

you're willing to publicly disclose, up to a

maximum of 5, I will donate $538 to Children's

Healthcare of Atlanta (http://www.choa.org/).

Best wishes,

Nate Silver

I'm writing you for two reasons.

Firstly, I wanted to make sure that you had some

decent contact information for me. Was really

looking forward to the lawsuit and figured you

might be having trouble getting in touch. I'm

at this e-mail address -- xxxxxxxx@gmail.com,

or at xxx-xxx-xxxx.

Secondly, I wanted to provide you an opportunity

to clear the air about one singular fact. What

call center(s) have you used to conduct your

public-facing polling? For every call center that

you're willing to publicly disclose, up to a

maximum of 5, I will donate $538 to Children's

Healthcare of Atlanta (http://www.choa.org/).

Best wishes,

Nate Silver

*And they're even in the same industry? Wow, someone really needs a trademark lawyer.*

'We've had a number of people very confused because one of the things we're known for is impeccable data,' said Strategic Vision Inc. President Alexander Edwards. 'We have absolutely, positively no relationship whatsoever.'

It's not entirely clear what the long-term implications will be for Atlanta-based Strategic Vision.

It’s not entirely clear what the long-term implications will be for Atlanta-based Strategic Vision."*

On further review, I take back my objection. Nate's plotting the

