Join 3,494 readers in helping fund MetaFilter (Hide)


Social Neuroscience
February 27, 2009 8:31 PM   Subscribe

That Voodoo That Scientists Do. "When findings are debated online, as with a yet to be released paper (PDF) that calls out the field of social neuroscience, who wins?"
posted by homunculus (53 comments total) 7 users marked this as a favorite

 
The person who gets there first. Actually...with academics, its the one who publishes first.

Coincidentally...FIRST!
posted by hal_c_on at 9:15 PM on February 27, 2009


He envisages a wider formal online-review process, in which scientists could respond to papers, with their comments weighted based on their own publication record.

Part of me is dismayed that someone wants to turn the peer review process into one big blog debate. Part of me feels this may well be an interesting thing to try, although I fear that, as in the infamous "blogosphere", the loud voices may drown out the sensible ones.

"You've got your statistics all wrong" papers often have a big impact. My favorite one lately was Dominici 2002. In general, eventually the dust settles and everyone starts doing things better.
posted by Jimbob at 9:18 PM on February 27, 2009


You know what? Fuck Vul and the rest of the authors on that paper. Their main argument is that there is a pervasive "non-independence" of analysis in fMRI studies. The problem they identify is that there are two inferential steps in fMRI analyses: The first step identifies voxels (volume elements) in the brain based on the correlation of activity in the voxels with behavioral measures, and the second step tests whether the activity of the voxels correlates with those behavioral measures. If that seems redundant to you, it's because it is. Vul et al. argue that the non-independence stems from the second step relying on the results of the first step. Wager et al. argue (rebuttal) that this is not a "non-independent" analysis, but a single inferential step - to wit, the same analysis that identifies the voxels that are correlated with behavioral measures is used to report the strength of that correlation.

More to the point, however, is that Vul is committing an unforgivable sin: "we wanted to make the paper entertaining and to increase its readership." (from the first link). The fact that it's getting play on "the blogs" isn't a guarantee that it's right. The language was deliberately chosen to get a rise out of people, and it worked. Vul has turned an argument about methodology in science into a public relations clusterfuck. Peer review has a way of eventually getting it right - bad methods go away, eventually. This spat about statistics should have been hashed out in the scientific literature, not turned loose with words like "voodoo" that get the emotions revving.
posted by logicpunk at 9:41 PM on February 27, 2009 [5 favorites]


What's most interesting about this is the point that this broader audience has the power to shape the agenda for scientific funding based on "science-lite" versions of the work in question. In particular, the author of the work in question may inadvertently overshoot his mark: his goal was to call into question the methodology employed in social neurobiology; I would expect that the intent was to convince his peers that the methods should be revised. However, the broader audience consuming the stripped down accounts of the paper have the potential to actually stunt research into social neurobiology as a field: right now, my impression of social neurobiology is (almost certainly uncharitably) that it is based on shoddy statistics and that none of its claims can be trusted. If that becomes the popular view of the field it could be catastrophic, and this episode could push potential results from this line of inquiry decades into the future.
posted by agent at 9:45 PM on February 27, 2009 [1 favorite]


Noise may make the correlation higher sometimes, and lower at other times.

Yeah, I've long imagined a service that would "clean up" surveillance videos, to enable victims to better identify perpetrators. The cleanup would involve filtering with a sharpening filter, like in photoshop, where it lets you type in the coefficients for a rectangular kernel. But the coefficients, only a 9 x 9 grid of content-neutral numbers, would be specially concocted to reinforce the likeness of a predetermined suspect.

But then I decided to use my powers for good instead.
posted by StickyCarpet at 10:03 PM on February 27, 2009


More to the point, however, is that Vul is committing an unforgivable sin: "we wanted to make the paper entertaining and to increase its readership." (from the first link).

Yes indeed, God forbid a scientific paper is entertaining and widely read. The last thing we need is a better informed public.

It's telling that the critics of the paper are focusing on the word "voodoo," it's almost as if they are trying to distract attention from the legitimate criticism Vul brought up.
posted by afu at 11:01 PM on February 27, 2009


Yeah, there is a recent trend towards giving papers snappy, witty, eye-catching titles. I don't like it much myself, but this is hardly the first paper to ever do this.
posted by Jimbob at 11:26 PM on February 27, 2009


That paper sounds arrogant as all fuck, quite frankly. If you're going to call large amounts of research into questions and "urge" authors to correct the scientific record, it seems like it may be bad form to be an arrogant douchebag about it.
posted by Krrrlson at 11:39 PM on February 27, 2009


i didn't feel any arrogance when reading it but perhaps it's coz I wasn't paying attention to the style as much.

I did find the noise simulation quite remarkable though. The fact that you can get ridiculously high correlations out of pure noise was instructive, to myself
posted by spacediver at 12:29 AM on February 28, 2009


I'm divided about this. They seem to have a point (the graph on page 14 looks pretty damning), but the presentation is a little bit biased (I'd prefer if they didn't include each paper more than once, and using a stacked graph this way is misleading, especially when screen reading).

I'm also deducting some points because it appears to be typeset with Microsoft Word.
posted by you at 3:24 AM on February 28, 2009


As usual, I know nothing abut any of this stuff. But I think it is a bit funny that they debate publishjing a refutation online before the paper gets published and Metafelons debate the uses of pre-publication on line whilst a-swim in the Big Blue.
posted by Postroad at 6:01 AM on February 28, 2009


Krrrlson, it might be useful to note that Ed is backed up by a professor here. Kanwisher is fairly prominent in neuroscience. The only thing that should be concern here is whether the research being called into question is in fact wrong and not the perceived arrogance of the graduate student main author.
posted by kldickson at 6:05 AM on February 28, 2009


Anyone who has ever hung out at the casino knows you can get ridiculously high correlations out of pure noise. Just ask the folks at the Roulette and Baccarat tables if they see any patterns in the little scorecards they keep.
posted by localroger at 6:28 AM on February 28, 2009



Yes indeed, God forbid a scientific paper is entertaining and widely read. The last thing we need is a better informed public.

Oh, hell. Now the public wants to know what we're up to in those shiny labs. It. Never. Ends.

Seriously, no one would say that the public shouldn't take an interest. The problem is that Vul's paper is being delivered to the public as if it were a scientific law when it's anything but. There're whole rounds of debate to go before the neuroscientific community figures out what, if any, changes need to be made to fMRI methods.

By starting off with the goal of making his paper "entertaining", Vul has left off being a scientist and started being a performer, and it's not good for science. Now the public has grabbed onto the idea that a perfectly okay technique with maybe a couple kinks that need to be worked out has somehow undermined the entire field.


It's telling that the critics of the paper are focusing on the word "voodoo," it's almost as if they are trying to distract attention from the legitimate criticism Vul brought up.


It's almost like Vul is an incredible asshat with no legitimate criticisms to bring up, so he needs to use the word "voodoo" to get any attention at all.
posted by logicpunk at 6:55 AM on February 28, 2009 [1 favorite]


Neuroscientist: Internet, video games rewiring kids' brains - "Fast-paced bytes of information gathered from today's social networks and video games are responsible for rewiring kids' brains, says UK neuroscientist Susan Greenfield. Though she doesn't cite any research, she connects these new technological habits to the behaviors of infants and autistic children." [em added if you didn't catch an oxymoron in there somewhere]
posted by kliuless at 7:14 AM on February 28, 2009


Whatever the validity of their arguments, the authors' use of voodoo and other snark was wrong. Such terms inappropriately cast aspersions on the intentions and motivations of the authors being criticized. The critique is supposed to be based on the science, not the critics' feelings about the work. They slimed the individuals and neuroimaging research in general by calling it voodoo. Withdrawing the comment (removing the term from the title at the editor's request) is like a witness slipping something inadmissable into testimony and then having it stricken from the record, once the damage is done. Journal editors routinely ask authors to take loaded language out of their papers and it's a good thing they do. This is a new journal with a new editor and he probably regrets not having been tougher with them. The evidence and arguments should stand on their own. You want a readable entertaining version of the controversy, try blogging about it or write popular science. While entertaining us, try presenting the arguments correctly, too.
posted by cogneuro at 7:17 AM on February 28, 2009


It seems like a positive development to me because it reveals some of the processual nature of science -- what events like this help nonscientists understand is that just because a paper gets published doesn't mean the results it reports can't be questioned. But the fact that there is serious debate, rather than the standard frivolous online banter, also suggests a further fact: science is hard work with important consequences. And this seems like an important point to bring to light, especially given the US's recent (i.e., within the past few decades) failure to bankroll science.

Keeping science interesting to the public might not be the best way to obtain scientific results, but it is good advertising for the whole scientific endeavor.
posted by voltairemodern at 8:31 AM on February 28, 2009


The problem with Vul's paper is that, for all it's supposed "seriousness", it is, at its heart, a troll. If you prefer, you could call it iconoclastic, but the end result is the same. The difficulty is deciding if it's worth spending time on or not.

If there is a point to their argument---which I'm not capable of evaluating, this isn't my field at all---the authors certainly don't help their case with sarcasm or snark. All that serves is to bring out the science writers, who love a good fight. One can only conclude that Vul and his co-authors did this because they are, yes, trolls.

Yes indeed, God forbid a scientific paper is entertaining and widely read. The last thing we need is a better informed public.

This does not appear to be Vul's goal at all. He and his coauthors seemingly wanted to stir up the pot just to get attention. The substance of his paper could have been presented much differently. This is a net negative if his goal is to be considered advancement of the field.
posted by bonehead at 9:12 AM on February 28, 2009


Yes indeed, God forbid a scientific paper is entertaining and widely read. The last thing we need is a better informed public.

Actually, I have to agree that a scientific paper should not be entertaining or widely read. The result would not be a better-informed public, but it might be a more confused, misled, or even cynical public.

Scientific papers are written for an audience of experts and are intended to show a boring, fact-based case for a boring, fact-based claim. Trying to spin your findings to be more interesting rather than more accurate (as may be happening here) defeats the purpose of science, which is not, in fact, entertainment.

If unqualified people read papers, they usually won't pick up on all the nuances, though they may not be aware of this. They may think the author's claim is stronger than it really is, because they are not well-versed enough to spot any of the kind of subtle flaws that typically derail an argument. They may also misunderstand the author's point. It's not a scientific paper, but I know of a guy who read Lee Smolin's The Trouble With Physics and actually thought his message was that you should use pure intuition and "natural philosophy" to understand the world instead of rigor and experiment!

All this isn't to say that people shouldn't be informed about science or educated. But this isn't the purpose of scientific papers. For public education, we have an education system, we have TV shows, we have PBS, we have some scientists' web sites, and we have things like Science News magazine. These are all specifically designed to tell non-experts about the latest developments in a field.
posted by Xezlec at 9:52 AM on February 28, 2009 [2 favorites]


They probably could have done without the word "voodoo", but otherwise I don't see a problem with the paper itself. They've identified a problem that has been understood for a LONG time: multiple comparisons. Basically, the more things you observe, the more likely it is that two of them will seem to correlate with each other (and therefore, the more demanding your standards need to be to say that correlation is significant). Sometimes it's easy to forget this, or not recognize that it applies to a certain situation. It's a problem that I think many scientists are aware that fMRI is especially vulnerable to, but it's a cross-discipline problem. Any time you see a study that indicates a weird connection (eating cheeseburgers increases your likelihood of death in terrorist attack), you should be suspicious of scientists not correcting for multiple comparisons. What the authors have done that is interesting, is they've come up with a rather simple measure of saying, "if you're honest and do good statistics, you probably shouldn't get correlations above X", and they can show that there's quite a few that are above that. I haven't had time to read the whole paper, but it should be noted that such large correlations can validly occur under certain circumstances (perhaps one quantity being measured is insensitive to the typical fluctuations that underly neural processes) and of course they can occur occasionally by coincidence.

For me, this sort of statistical problem is extremely frustrating. Most of the math necessary to do science well is actually pretty basic. But teaching is spotty at best. I've spent YEARS taking one form of calculus after another, but I've had to pick up the really useful stuff on the job, often by making a mistake and getting called out on it, or by reinventing the wheel. Since scientists are human, they often become overly defensive and refuse to admit error when they get caught doing bad statistics. So if you're a crusader for good statistics, it can feel like pissing into the wind.
posted by Humanzee at 10:03 AM on February 28, 2009


For all those commenting on how snarky the paper seemed, have you actually read it? Or are you just basing the snark allegation on the title?
posted by spacediver at 11:46 AM on February 28, 2009



They probably could have done without the word "voodoo", but otherwise I don't see a problem with the paper itself. They've identified a problem that has been understood for a LONG time: multiple comparisons.


That is certainly not the problem they're attacking:
This multiple comparisons correction problem is well known and has received much attention. The problem we describe arises when authors then report secondary statistics.
Vul et al, 2008
There are methods to correct for multiple comparisons, and there are neuroimagers who ignore those methods. Those are Bad People and we hate them. Those aren't the ones Vul is attacking; rather, he has constructed a straw man to knock down:

Imagine you have a class of 100 students, and you want to study whether there's a correlation between # of hours studying for a Psychology class with final grades in that class. Now, you'll have some students who study their ass off and get good grades, and some students who don't study at all and fail. These are the ones that show a large correlation between studying and grades. BUT, you'll also have students who get good grades without studying at all, and others that will fail regardless of the amount of the time studying. Vul is claiming that neuroscientists are essentially just looking at the students who show a strong correlation, and leaving the others out of the analysis. If that were actually what neuroscientists did, it would be pretty damning.

Here's the non straw man version of the above:

You have 1000 students in 10 groups. Each group is enrolled in a different class, but you, the experimenter, don't know which class the groups are assigned to, and you don't know exactly how many students are in each group. You want to study the effects of # of hours studying on final grades, but only in Class A (and not classes B-J). Now each student is going to spend a certain number of hours studying, but only some of them will be studying for Class A. At the end of the semester, you look at final grades in Class A, and lo and behold, a group of 100 students shows a strong correlation, while all the others show a weak or nonexistent one. It is acceptable to ignore the correlations from the other 900 students because that is not what you were interested in to begin with.

The problem in this scenario is, as Vul gets right, that you might only get 90 of the 100 students who properly belong in Class A because the last 10 are less strongly correlated. This will have the effect of making the correlation look stronger than it actually is, but it doesn't mean the correlation doesn't exist, or that the researcher is taking additional inferential steps, as Vul claims. Vul has massively overstated his case, and he's done it in as dickish a way as possible. There are reasonable steps one could take to prevent correlation inflation, and the academic community would have been better served if Vul had spent time working on how to do that rather than being a big jackass about it.
posted by logicpunk at 11:49 AM on February 28, 2009


Spacediver: Read it. Understood it. The snark extends beyond the egregious title.
posted by cogneuro at 11:54 AM on February 28, 2009


logicpunk: That is certainly not the problem they're attacking:
Oh wow, you're right. That's what I get for skimming. After reading more carefully: what you said.
posted by Humanzee at 12:34 PM on February 28, 2009


Logicpunk, from what I understood, the targets of vul's criticism were not looking at the group in class A only.

That would be an acceptable way to choose which voxels are of interest and then analyze them (for example in face perception research, the region of interest is often defined in advance by some face vs house localizer scans, and then those regions which show preferential activation for faces are then tested further and analyzed).

But vul's targets aren't deciding in advance that they'd like to analyze a certain brain area, or a certain Class (A in your example). Rather, they are choosing which areas to analyze based solely on correlations. So it's as if they just looked at all the classes as one big unit, and searched for correlations, and chose the highest correlating students for analysis.

This, from what I understand, is what the non-independence error is all about.

The survey he sent out to the authors explicitly asked exactly how they chose their voxels for analysis. He only targeted those who responded that they chose the voxels in the strawman'd version of your post.

Now it may be the case that the survey wasn't fine grained enough, or that there were misunderstandings, but I've not looked into the issue in enough depth to say whether that's the case.
posted by spacediver at 12:38 PM on February 28, 2009


Humanzee:
Completely agree that multiple comparisons is still a problem, by the way... I get the feeling that some fMRI studies do everything they can to skirt around the issue, otherwise they wouldn't have, you know, results.
posted by logicpunk at 12:40 PM on February 28, 2009


Actually, I think the issue that Vul et al. identify has less to do with multiple hypothesis correction (which I get the sense is, fortunately, standard practice) and more with cross-validation. (On preview, other people have said this already.) This isn't my field, so I apologize in advance if I'm butchering it, but this is the understanding that I came away with after reading the article.

Suppose you perform an fMRI study and compute correlation coefficients between brain voxels and a behavior of interest. You then correct these correlation coefficients for multiple hypothesis testing. You then report the mean or maximum correlation of these significant voxels: let's say, for the sake of example, you find a maximum correlation of r = 0.9. In effect, what you have done is build a statistical model that relates brain activity to behavior, where based on some function of voxel activity you can predict whether some behavior is being performed.

The implication of reporting this number is that, if you were to analyze the brains of ten more patients, one should find a correlation of 0.9 (or close to that) between the previously-identified voxel and the behavior in the new subjects as well. However, if you actually did analyze ten new subjects, performing the same analysis, a likely outcome would be that a somewhat different set of voxels would then appear to be most significantly correlated. You would probably find some new voxels to be significant, and lose some of the ones you identified previously. More importantly, the levels of correlation that you observe between the voxels and the behavior would differ, due to experimental noise.

What this all means is that the relationship you would learn concerning brain voxels and behavior from the two independent studies could be different. As a consequence, to give an extreme example, if your model from the first study was "look at voxel #131" (essentially the result of picking the maximum correlation), you could potentially get a drastically lower correlation value applying this to the second data set, even if #131 is still significant. This is problematic. If we want to claim that we have identified an association between a brain region and an activity, the nature and strength of this association shouldn't change that much between two studies with the same design.

Say that you instead reported the average correlation of several voxels (e.g. r = 0.8) with significant p-values. However, a p-value only tells you the likelihood of observing as good a result by random chance. It does not say anything about the probability that, if you repeated the experiment, you would find a correlation that strong. In other words, it doesn't explicitly tell you anything about the predictive power, even after the appropriate MH correction. In fact, in some ways, the more stringent you are with your selection, the worse the problem is: you're betting that a more and more specific group of voxels will behave consistently, while making the mean correlation to the behavior that you report higher and higher. This seems like a case of overfitting. In other words, the predictive power of your association is the quantity that is the most interesting. However, the correlation values for the voxels that you have already selected is not necessarily linked to prediction accuracy, and can even have an inverse relationship.

The best test of predictive power would be, obviously, to repeat the experiment, but that's not always feasible. A reasonable alternative would be to use a technique called cross-validation. Essentially, you divide the data randomly into (for example) two halves. You learn a relationship from the first half of the data, and when it comes time to assess the reliability of this relationship, you only evaluate on data you haven't already used as part of the learning procedure. This affords some protection against overfitting and gives you a much better idea of how reliable your association is.

Anyway, sorry to be so long-winded - this is partly my kicking the tires on my own thought processes by writing them out - but it certainly seems to me that the authors have an excellent point. Cross-validation and related techniques are widely used in machine learning and statistics for precisely the reasons that they identify (i.e, in general, the correlation between a fitted model and a behavior of interest tends to drop when you look at new data).
posted by en forme de poire at 1:30 PM on February 28, 2009 [4 favorites]


So, uh, what's the actual point of social neurobiology? What are we trying to get out of it? Can someone give some practical examples of how it's being used?
posted by empath at 2:08 PM on February 28, 2009


en forme de poir: you might be interested in the reply by Lieberman et al (pdf) which addresses some of the issues you mention. As far as cross-validation goes, yeah, it would be a good thing to do in general. However, MRI time is expensive, and most researchers don't have the funding to collect enough data that they can chop it in half.
posted by logicpunk at 3:08 PM on February 28, 2009


Wager et al. argue (rebuttal) that this is not a "non-independent" analysis, but a single inferential step - to wit, the same analysis that identifies the voxels that are correlated with behavioral measures is used to report the strength of that correlation

I just had to read the abstract to realize that Vul et al. are correct, these analyses are bogus. It's not "non-independence" that is the problem, but culling the highest correlations out of a bunch of them and reporting them as "important". The problem is multiple inference. It doesn't just affect p-values, it affects any extreme values. If I randomly generate 200 variables and look at the correlations with 200 other independent randomly generated variables, I guarantee you will get some of them that are very high. And that's the problem with these optimistic analyses: you can't tell whether the high correlations are real or just chance.
posted by Mental Wimp at 4:49 PM on February 28, 2009


Wager et al. argue (rebuttal) that this is not a "non-independent" analysis, but a single inferential step - to wit, the same analysis that identifies the voxels that are correlated with behavioral measures is used to report the strength of that correlation

It's amazing how slowly statistical knowledge filters down to practice among non-statisticians. Halving datasets for cross-validation is horse-and-buggy practice. Leave-1-out methods (or, more generally, K-fold cross validation) make use of the entire dataset for both training and validation and can be wrapped with bootstrapping to get robust measures of variability from the same process. Essentially, you leave one observation out of your dataset, fit your model to the remainder and see how well your model predicts the omitted observation. Replace the observation and repeat until all observations have been left out once. Now combine all the deviations of your predictions from your observations and that's your measure of model validity. (Somewhat oversimplified, but that's the basic idea.) To get variance, apply this method to bootstrap samples drawn from the original data and do that a thousand times, and voila, a robust estimate of the variance of the validity measure.
posted by Mental Wimp at 4:58 PM on February 28, 2009 [1 favorite]


Oops, the quote that was to head the previous comments was this:
However, MRI time is expensive, and most researchers don't have the funding to collect enough data that they can chop it in half.
posted by Mental Wimp at 4:59 PM on February 28, 2009


This is tricky! It's great to see the blog world go crazy over a science paper, it sucks that careers might get damaged and funding may dry up over what is otherwise a non-issue. Lieberman's reply estimate the inflation of r-values to be minuscule, while Vul et al. explicitly state that the studies they criticize are: "To sum up, then, we are led to conclude that a disturbingly large, and quite prominent, segment of social neuroscience research is using seriously defective research methods and producing a profusion of numbers that should not be believed."

This is irresponsible writing in extremis. A "disturbingly large", "seriously defective", "numbers that should not be believed".

For one, the papers that performed this obstensibly non-independent analysis are tiny in number compared to the sum total of 10+ years of social neuroscience. "Seriously defective" and "numbers that should not be believed" have both been addressed by Liebermen. The inflation that has Vul and colleagues all in a tizzy is likely real but also likely rather small. Especially in studies using normal sample sizes, something Vul and colleagues disingenuously underestimated in their simulations (10 subjects? come on that hasn't been the norm since the mid 90s).

Also, just to reply to one of the previous comments here. Appeals to authority notwithstanding, yes Kanwisher is a big wig, but so are the authors on the rebuttal and even more so are the people listed as having provided comments on the rebuttal. Two camps are developing around this, and they're filled with top names and pro statisticians, so let's just not pick a side just because of who's on it.

The real damage here is that people have gone away with the take-home message that social neuroscience methods are flawed and not to be trusted. The analysis method under fire represents only a minor portion of the analysis strategies (and papers) in social neuroscience. Moreover, it's a method of analyzing brain data, not "social" brain data, but any and is used in all fields of neuroimaging, be it clinical or cognitive. It's just that going after clinical or cognitive neuroscience isn't as likely to get you noticed.
posted by Smegoid at 6:17 PM on February 28, 2009


culling the highest correlations out of a bunch of them and reporting them as "important". The problem is multiple inference.


Again, this is not the problem. It is standard in any fMRI analysis to correct for multiple comparisons to avoid false positives. Even for the high-probability correlations that remain, it would be nonsensical to cull all the highest correlations from everywhere in the brain. No one does this. Generally a researcher is interested in one or two areas that have reasonably well-defined anatomical boundaries - the correlation strengths that are reported are usually going to describe the strength of the effect in that one region, not the strength for all voxels that passed everywhere.
posted by logicpunk at 6:56 AM on March 1, 2009


The real damage here is that people have gone away with the take-home message that social neuroscience methods are flawed and not to be trusted.

The thing is that a lot of people already had a pretty low opinion of social neuroscience already because of the tendancy of practitioners too pimp out their FMRIs for ridicoulus studies like "Republican brians/ Democratic Brains". This is the real reason this study is getting so much attention is that a lot of people already thought something smelt funny in social neuroscience, and they jumped on this study because it confirmed their feelings.

So, uh, what's the actual point of social neurobiology? What are we trying to get out of it? Can someone give some practical examples of how it's being used?

To be honest, I don't think there is much of a point to it besides pretty FMRI pictures and press releases. We simply do not know enough about social psychology or neuroscience to have a fruitful interdisciplinary study. FMRIs in particular are much too course of an instrument to be used to study extremely complex social interactions. At the most it can point to an approximate brain region to study, but most of these regions are already known through animal and lesion studies.
posted by afu at 7:39 AM on March 1, 2009 [1 favorite]


The error that they discuss here is actually pervasive in scientific analysis. The problem that is being discussed is that any discipline that uses "statistical significance" as an important factor in deciding publication will generate erroneous results due specifically to that factor.

As one example, a researcher runs a study with 100 variables. After combining some of those variables, carrying out transformation, etc, it's not difficult to have several thousand statistical correlations amongst the variables, their combined/transformed indices.

For the sake of example, let's say there are 2000 correlations. So, if the researcher uses, say the conservative .01 significance level (as opposed to the more liberal .05 level), then just by chance alone 20 will come out as significant (.01 x 2000 = 20). Now the problem comes when the researchers treats those as actually significant (i.e., replicable) results. It's the same thing as a nonscientist who sees a correlation between winning at the track and wearing a lucky hat. Although in principle, a scientist should have the training and discipline to see the error, as a recovering academician, I know of dozens of examples where that's not the case.

A prof needs to publish or perish, so they overlook the speciousness of their statistical analysis.

As mentioned above, cross-validation, is one solution. Another is to use analysis that is either analytically, or via monte carlo methods corrected to the number of comparisons made. But the number of times I have suggested this to ex-colleagues who asked me for statistical advice is in the hundreds; whereas the number of times that advice is followed is in the single digits.
posted by forrestal at 8:11 AM on March 1, 2009


forrestal: the problem you describe is not what this paper is about. I'm not saying that problem doesn't exist, but it's not the one under discussion amongst neuroscientists right now. Decent researchers in neuroscience already correct for the numbers of comparisons made.

But, sure, even with the corrections there will be false positives. But let's put it in context: An fMRI study might scan the brain at a resolution of 64x64x33 voxels. That means for each correlation, you're testing 135168 variables. For a significance level of .01, you'll get around 1400 false positives without correction for multiple comparisons. If the false positives are really random, they will be distributed more or less randomly around the brain, a single voxel here, a single voxel there. The probability of finding 1400 contiguous voxels that are all false positives is vanishingly small, and that's what researchers are looking for: brain regions that span hundreds or thousands of voxels that correlate with behavior.

Ironically, the mathematics of the paper actually suggest that the correlation values are inflated by selecting more rigorous p values. If you set your p value to .00001, only one or two voxels are going to pass just by chance, but the correlation coefficient is going to be really, really high.
posted by logicpunk at 10:57 AM on March 1, 2009


Generally a researcher is interested in one or two areas that have reasonably well-defined anatomical boundaries - the correlation strengths that are reported are usually going to describe the strength of the effect in that one region, not the strength for all voxels that passed everywhere.

But even so, the researcher will experience the same problem as in cancer clusters. By chance, even rare occurrences like cancer (or large correlations) will cluster by any defined group metric (like geography or "brain area") you want to pick and with the number of voxels available it is inevitable. Correcting p-values using Bonferroni or any other method can't overcome it, because the voxels themselves are intercorrelated spatially. The only way around the problem is to severely limit the number of a priori hypotheses, state them quantitatively and extremely specifically, and cross-validate them carefully. Otherwise, they are just exploratory analyses and not worthy of publication as "findings."
posted by Mental Wimp at 9:23 PM on March 1, 2009


The only way around the problem is to severely limit the number of a priori hypotheses, state them quantitatively and extremely specifically, and cross-validate them carefully. Otherwise, they are just exploratory analyses and not worthy of publication as "findings."

And? This is like warning dentists not to have sex with their anaesthetized patients: we all agree, in principle, but christ, who actually does this? Sure there's going to be a few bad researchers who will throw anything and everything at the wall to see what sticks. We don't like these people because they give the field a bad image. Competent researchers in the area start off with a priori hypotheses about where correlations with behavioral measures will occur in the brain - they look for those correlations to appear in those bounded areas. If a cluster happens to show up elsewhere, they don't go willy-nilly shouting that they've found the new X center of the brain.

I'd also like to point out that we are not even discussing the Vul paper anymore. His point was utter crap, but it put the idea in people's minds that social neuroscience is utter crap and that the scientists doing the research are charlatans and frauds, to the point where you're implying that they're not even following basic scientific methods. I'll happily agree with your earlier point that some of the statistical methods in use are quaint and could use an update - that's far different from accusing them of out-and-out deception or incompetence.
posted by logicpunk at 8:45 AM on March 2, 2009


logicpunk:

If a cluster happens to show up elsewhere, they don't go willy-nilly shouting that they've found the new X center of the brain.

But this is precisely what Vul is accusing them of doing them, if I'm not mistaken. Can you point to examples of where his accusation is false here? Again, remember the survey he sent out to the authors.
posted by spacediver at 10:13 AM on March 2, 2009



But this is precisely what Vul is accusing them of doing them, if I'm not mistaken.


Keep in mind, Vul is attacking only one kind of analysis - correlations of brain activity with behavioral measures - and it's not even the most common kind of analysis. It's far more typical to do a straightforward t-test without reference to any behavioral traits, and those can give you lovely blobs of brain activity. So, no, that's not the accusation he's making.
posted by logicpunk at 5:46 PM on March 2, 2009


I'm not following - are you saying that he's making more than one kind of attack, one about cherry picking voxels which correlate with behavioural measures, and another about the far more typical t-tests?

or are you saying that there is a whole field of social neuroscience that isn't subject to his criticism?

I'm just concerned with the papers he analyzes in his piece, not papers that weren't attacked.
posted by spacediver at 6:52 PM on March 2, 2009


Junk food marketers rediscover the Crockus
posted by homunculus at 8:27 PM on March 2, 2009


This is like warning dentists not to have sex with their anaesthetized patients: we all agree, in principle, but christ, who actually does this?

Um, most dentists don't have sex with their anaesthetized patients. I think.

And, yes, good scientific practice is rare in a lot of fields. That's why the literature is clogged with mostly useless findings. It takes a lot of work to find the gems among the paste.
posted by Mental Wimp at 10:24 PM on March 2, 2009




I'm not following - are you saying that he's making more than one kind of attack, one about cherry picking voxels which correlate with behavioural measures, and another about the far more typical t-tests?



In the paper, Vul is only attacking what he regards as cherry picking voxels from correlations. He seems perfectly happy with significance tests (like t-tests).

The method he accuses the papers he discusses of using doesn't, as far as I can tell, apply to any of them (see the response by Lieberman et al I linked above. They discuss this in greater detail.)

I'm going to try another example, because thinking about it in terms of brain areas seems to obscure the issue. Take a state like Colorado which (let's say) has 1/2 its cities in the mountains, and 1/2 at sea level. One day you want to find out what the average temperature of cities at sea level is, except you don't know which cities are at sea level and which are in the mountains. You can assume, however, that cities at sea level will have higher temperatures than the ones in the mountains. So you look for cities with a temperature higher than the average for all cities, and you say those are the ones that are at sea level. This isn't perfect, however - some cities that are actually at sea level might be having a cold snap that day, so you classify them as being in the mountains.

So now you have a list of numbers, but what you set out to get was the average temperature for cities at sea level. Seems easy enough, right? This is the step that Vul has a problem with. He claims that by taking the average, you are actually performing an additional inferential step which is non-independent of the first one (the first inferential step was figuring out which cities were (probably) at sea level. In science terms, "non-independent" is a Bad Thing.) The inference (he says) you're making is whether the cities you've selected are significantly above the average temperature for all cities. Since you selected those cities based on them having higher than average temperatures, you're guaranteed to get a significant result.

Again, however, NO ONE DOES THAT. Taking the average of a group of numbers is a descriptive step, not an inferential one; you're just reporting the average temperature for cities you classified as being at sea level. This number will be slightly inflated because a couple of the cities were having cold snaps, so you missed them, but it doesn't invalidate the whole analysis.
posted by logicpunk at 8:24 AM on March 3, 2009



Um, most dentists don't have sex with their anaesthetized patients. I think.

Which is my point. By bringing the issue up, however, you're implying that the problem is more pervasive than it actually is, much like you bringing up that scientists need to have a priori hypotheses that constrain their analyses implies that this there's a huge problem with scientists not doing that.

And, yes, good scientific practice is rare in a lot of fields. That's why the literature is clogged with mostly useless findings. It takes a lot of work to find the gems among the paste.


If you're arguing that Important Papers that make Big Steps Forward are less common than minor papers that only add a bit of uncertain knowledge to a field, I agree. This is, I'd say, built in to the system.

If you're saying that most published papers are deeply methodologically flawed, cite please?
posted by logicpunk at 8:24 AM on March 3, 2009


Love that this conversation is still going on. Spacediver, I refer you to my above comment with regards to whether he's accusing all of social neuroscience of making stats blunders or just a subset of the studies. (the short answer is yes he's accusing all of social neuroscience of being in error and he's flat out wrong because this type of analysis represents a tiny minority of the analysis strategies in social neuro... but hey making irresponsible sensational soundbite comments gets you presstime!)
posted by Smegoid at 12:59 PM on March 3, 2009


Neuropsychology
posted by kliuless at 7:37 PM on March 3, 2009


thanks for replies logicpunk & smegoid.
posted by spacediver at 12:44 AM on March 4, 2009


Look, there's a whole minefield of potentially bad statistical practice in areas where variability is large. Regression to the mean, selection bias, multiple inference (in both subtle and less subtle varieties), mis-modeling error (such as treating clustered data as independent), length bias, lead-time bias, etc., etc. Much of it goes unrecognized until someone points out a major blunder directly related to ignoring it. (Even then, the bad practices sometimes continue; see research on antiarrhythmic drug therapies, for example.) Most papers are not capable of creating "major blunders" since the publish-or-perish culture means most of the literature comprises insignificant findings and many of them have questionable validity because the studies were misdesigned or the analyses misdirected. And this ignores the studies where the bad practices are buried by a methods section that omits or mischaracterizescritical details. You only recognize these latter problems when you review for journals and ask hard questions based on your knowledge of the discrepancy between published methods and actual practice.

I really don't have time to review all of the social neuroscience literature, but I have reviewed enough of the general life science literature to recognize what Vul et al. are saying is consistent with my own observations as an applied statistician.
posted by Mental Wimp at 8:35 AM on March 4, 2009


regarding If you're saying that most published papers are deeply methodologically flawed, cite please? posted by logicpunk

see

Use and abuse of subjectivity in scientific research

Scientific method myth
posted by forrestal at 6:49 AM on March 5, 2009


Thanks for the first link, forrestal. It's an interesting choice, in that it encourages acknowledgement of the subjectivity of scientists (via Bayesian stats) rather than a false objectivity that comes with traditional significance testing. Which is all to the good; there's a number of people in neuroscience and psychology who are pushing Bayesian statistics, and eventually, a lot of people hope, it will supplant traditional sig. tests. However, saying that the use of traditional significance testing is a fundamental methodological flaw invalidates close to a century of published results. At worst, it's an imperfect method that can still yield useful conclusions, and better methods are currently being adopted.

That second link was just a whiny rant, though.



Look, there's a whole minefield of potentially bad statistical practice

But what Vul et al. are describing isn't that. Just because you're sympathetic with his point that there are bad statisticians doesn't make his specific critique correct. He called out an entire field and he didn't have the goods. He's far more guilty of bad statistical practice than the papers he included in his study.
posted by logicpunk at 8:40 AM on March 5, 2009


hmmm, can you give some examples of how the second link is a "whiny rant"

(not that it matters, but I adore statisticians, they keep the barbarians at the gate:

see the New Yorker cartoon at the end of the pdf:Well, I'll be damned if I'll defend to the death your right to say something that's statistically incorrect.)

posted by forrestal at 1:56 PM on March 5, 2009


« Older Speaking of Edward Tufte (see below), sparklines a...  |  Circuits are flipping on in th... Newer »


This thread has been archived and is closed to new comments