At the Far Ends of a New Universal Law
The law appeared in full form two decades later, when the mathematicians Craig Tracy and Harold Widom proved that the critical point in the kind of model May used was the peak of a statistical distribution. Then, in 1999, Jinho Baik, Percy Deift and Kurt Johansson discovered that the same statistical distribution also describes variations in sequences of shuffled integers — a completely unrelated mathematical abstraction. Soon the distribution appeared in models of the wriggling perimeter of a bacterial colony and other kinds of random growth. Before long, it was showing up all over physics and mathematics. “The big question was why,” said Satya Majumdar, a statistical physicist at the University of Paris-Sud. “Why does it pop up everywhere?”
Music Machinery presents a map of each U.S. state's most distinct favorite band or recording artist, as well as an app for playing with the data.
A/B testing has become a familiar term for most people running web sites, especially e-commerce sites. Unfortunately, most A/B test results are illusory (PDF, 312 kB). Here's how not to run an A/B test. Do use this sample size calculator or this weird trick.
Watching one of the exciting snow-bound football games yesterday, the thought may have occurred to you: If I was a coach, would I go for it on this 4th down? This bot from the New York Times will tell you, and maybe even add a little attitude to the answer, which is usually much more aggressive than NFL coaches.
"We have little trouble recognizing that a chess grandmaster’s victory over a novice is skill, as well as assuming that Paul the octopus’s ability to predict World Cup games is due to chance. But what about everything else?" [Luck and Skill Untangled: The Science of Success]
Dick Pfander's obsession with basketball box scores means that every NBA box score in the league's history is now accessible.
The year was 1945. Two earthshaking events took place: the successful test at Alamogordo and the building of the first electronic computer. Their combined impact was to modify qualitatively the nature of global interactions between Russia and the West. No less perturbative were the changes wrought in all of academic research and in applied science. On a less grand scale these events brought about a [renaissance] of a mathematical technique known to the old guard as statistical sampling; in its new surroundings and owing to its nature, there was no denying its new name of the Monte Carlo method (PDF). -N. MetropolisConceptually talked about on MeFi previously, some basic Monte Carlo methods include the Inverse Transform Method (PDF) mentioned in the quoted paper, Acceptance-Rejection Sampling (PDFs 1,2), and integration with and without importance sampling (PDF).
An "Exciting Guide to Probability Distributions" from the University of Oxford: part 1, part 2. (Two links to PDFs)
Bill James, a pioneer in the field of baseball statistics, has now turned his attention to serial killers and their methods.
Statistical hypothesis testing with a p-value of less than 0.05 is often used as a gold standard in science, and is required by peer reviewers and journals when stating results. Some statisticians argue that this indicates a cult of significance testing using a frequentist statistical framework that is counterintuitive and misunderstood by many scientists. Biostatisticians have argued that the (over)use of p-vaues come from "the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result" and identify several other problems with significance testing. XKCD demonstrates how misunderstandings of the nature of the p-value, failure to adjust for multiple comparisons, and the file drawer problem result in likely spurious conclusions being published in the scientific literature and then being distorted further in the popular press. You can simulate a similar situation yourself. John Ioannidis uses problems with significance testing and other statistical concerns to argue, controversially, that "most published research findings are false." Will the use of Bayes factors replace classical hypothesis testing and p-values? Will something else?
Stanford's Visualization Group has produced a data cleanup web app called Wrangler that works like straight up magic.
Wins-above-replacement, or WAR, is a Sabermetric term of art for baseball player comparison. Fangraphs, one of the go-to sites for baseball nerdlingers, now offers a way to make WAR grids, an amazingly easily comprehended visual display comparing players based on WAR, sortable by team, position and season, with a default topline of player age. [more inside]
Dataists give their hopes and dreams for data, data tools and data science in 2011. Already, Google has provided Google Refine (previously) to help clean your datasets. While great visualizations can be created with online tools or by combining R (great posts previously), with ggplot2, GGobi, and even Google Motion Charts With R (already built into Google Spreadsheets). Need data? Needlebase, helps non-programmers scrape, harvest, merge, and data from the web. Or if you’re introspective, Your Flowing Data and Daytum provide tools to measure and chart details of your own life.
OK Cupid statistics fun: We collected 552,000 example user pictures. We paired them up and asked people to make snap judgments. Here's what we found.
Andrew Gelman recently posted this strange trend in baby naming originally posted on Laura Wattenberg's blog in 2007. Why do so many boys' names now end with the letter "n"?
Mariano's Gonna Cut You, and other stat-and-graph filled baseball analysis from Beyond the Boxscore. [more inside]
He predicted a losing season for the White Sox in 2007 and foresaw that the Tampa Bay Rays would be the best team in the American League in 2008, although he wrongly predicted that the Rays would win the World Series. He also predicted Obama's 6-point victory over McCain. Now the stats guru Nate Silver is picking the Oscar winners and predicting an upset win for Taraji P. Henson in the Best Supporting Actress category.
Special 3-page edition of Harper’s Index: A retrospective of the Bush era.
US Census Bureau Facts & Figures: Holiday Edition says that more than 20 billion letters, packages and cards will be delivered this holiday season and 12 million packages a day through to Christmas Eve. Also check out the Special Edition for comparison data from 1915, 1967 and 2006, the African-American History Month Facts & Features and more data going back to 2000.
Nation Master An amazing resource that displays all sorts of comparative national statistics on practically everything, and with an option of selecting any region / list of countries you choose. It plugs itself as "The world's biggest general stat site" (which might or might not be true I don't know), and it has a wealth of data on economics, sports, population, geography and a dozen more categories. Some interesting statistics; Top 100 in Olympic medals per Capita. Top 100 Murders with firearms (per capita). Top 100 Military Expenditures as a percent of GDP . Top 100 Net migration rate .
A heaven for data freaks.
A heaven for data freaks.
The Incarceration Atlas. Everyone's probably familiar with the usual stat that America has the world's highest rate of incarceration, but there are some other pretty interesting numbers here too, touching on some Metafilter favorites - race, education and drugs. (more inside...)