After beating the Texas Rangers on Sept. 3, the Boston Red Sox were 84-54. Although half a game behind the Yankees in the American League East, the Red Sox had a nine-game lead over the Tampa Bay Rays for the wild card and roughly a 99.6 percent chance of making the playoffs.
Fast forward one excruciating month
to a dead heat with Tampa coming into tonight's bitter imbroglio. Boston struggles ahead of laughingstock Baltimore by a single run until a rain delay clears the field, leaving them in the surreal position of rooting for the hated Yankees
playing down in Florida. They can only watch from the sidelines as the rival Rays, tied with Boston in the pennant race but down 7-0 against New York, roar back to life with six runs in the eighth inning and a tie run on the final pitch at the bottom of the ninth
. And then, after blowing two different strikes that would have salvaged the game, Boston loses to Baltimore
, completing what is arguably the worst late-breaking collapse in the history of major league baseball
Knocked Up & Knocked Down
Why America's Widening Fertility Class Divide is a Problem [more inside]
Florence Nightingale's Statistical Diagrams.
Famous as the mother of modern nursing, she was also an immensely talented applied statistician and visual information artist. These skills were instrumental in persuading 19th century British health authorities to improve hospital hygiene. She originated a graph type now known as “Nightingale's Coxcomb” and used it to dramatic effect. Examples of these graphs were presented in her monograph, “Notes on matters affecting the health, efficiency and hospital administration of the British army” published in 1858. That same year she became the first female fellow of the Statistical Society of London (now Royal Statistical Society). An animation of the coxcombs here
. The Nightingale Crimean War coxcombs are considered by some to be one of the three best graphics in history
. [more inside]
U.S. Poverty Rate, 1 in 6, at Highest Level in Years (NYT) -
An additional 2.6 million people slipped below
the poverty line in 2010, census officials said, making 46.2 million people in poverty in the United States
, the highest number in the 52 years the Census Bureau has been tracking it, said Trudi Renwick, chief of the Poverty Statistic Branch
. That represented
15.1 percent of the country. The poverty line in 2010 was at $22,113 for a family of four
Statistical analysis of OKCupid profiles
exposes some sexually fascinating revelations:
like giving oral more than omnivores
- Twitter users are more likely to masturbate today
- Christians and Atheists are just as likely to claim they have never
- The correlation between men who prefer gentle sex & use of the word 'boating'
I f**king love statistics [more inside]
- experimental programmatic type and infographics (demos and text auf Deutsch
"Rape Reporting During War: Why the Numbers Don't Mean What You Think They Do."
An article in Foreign Affairs arguing that the incidence of rape during wartime is both understated and overstated, and that these are both serious obstacles to addressing the issue of wartime sexual violence.
Jason van Gumster has been telling a lie a day
since November 8, 2006.
Cash WinFall, or how to turn the lottey into a real moneymaker.
In Massachusetts, one state-sponsored lottery has become a game you can't lose....if you know the trick. A tale of math, grinding and grifting in the Boston Globe.
A corpus analysis of rock harmony
[PDF] - The analyses were encoded using a recursive notation, similar to a context-free grammar, allowing repeating sections to be encoded succinctly. The aggregate data was then subjected to a variety of statistical analyses. We examined the frequency of different chords
and chord transitions ... Other results concern the frequency of different root motions, patterns of
co-occurrence between chords, and changes in harmonic practice across time.
More information, analysis, and explanation here
On not reading books. Franco Moretti
, author of the controversial Graphs, Maps, Trees: Abstract Models for a Literary History
, proposes that literary study needs to abandon "close reading" for "distant reading": "understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data." He is co-founder of the Stanford Literary Lab
, where he and like-minded colleagues have published studies on programming computers to use statistical analysis to identify a novel's genre
(PDF) and analyzing plots as networks
(PDF). Similar projects
are on the way.
The Street Price of Cocaine, Country to Country
The Economist's report is based on data from the UN's recently released World Drug Report.
"The mind knows not what the tongue wants."
We all take variability and niche markets for granted these days, but back in the 70's and 80's, the American food industry was obsessed with the so-called platonic dish - a perfect and universal way to serve a food. Howard Moskowitz
, of prego fame
, helped explode the idea in the food industry and beyond.
In this TED talk, Malcom Gladwell, tells you all about it and why variability matters a lot. [more inside]
Although much has been said
about the demographic composition of the United States Congress, much less has been said about the thousands of staffers who work behind the scenes, drafting legislation, interacting with constituents, and advising their congressperson. The National Journal has created two infographics
that attempt to describe this silent, but influential workforce.
is a veteran American cartoonist best known for his delightful comic-book guides to science and history, many of which have previews online. Chief among them is his long-running Cartoon History of the Universe
(later The Cartoon History of the Modern World
), a sprawling multi-volume opus documenting everything from the Big Bang to the Bush administration. Published over the course of three decades, it takes a truly global view -- its time-traveling Professor thoroughly explores not only familiar topics like Rome and World War II but the oft-neglected stories of Asia and Africa, blending caricature and myth with careful scholarship (cited by fun illustrated bibliographies
) and tackling even the most obscure events with intelligence and wit
. This savvy satire carried over to Gonick's Zinn
chronicle The Cartoon History of the United States
, along with a bevy of Cartoon Guides
to other topics, including Genetics, Computer Science, Chemistry, Physics, Statistics, The Environment
, and (yes!) Sex
. Gonick has also maintained a few sideprojects, such as a webcomic look at Chinese invention
, assorted math comics
), the Muse magazine
mainstay Kokopelli & Co.
(featuring the shenanigans of his "New Muses"
), and more
. See also these lengthy interview snippets
, linked previously
. Want more? Amazon links to the complete oeuvre inside! [more inside]
In the recent MIT symposium "Brains, Minds and Machines
," Chomsky criticized the use of purely statistical methods to understand linguistic behavior. Google's Director of Research, Peter Norvig responds
) [more inside]
“certain styles of research were suggested to be prone to ‘groupthink, reduced creativity and the possibility of less-rigorous reviewing processes.’
Edward Wegman is a professor at George Mason and a distinguished statistician with a long career, a former winner of the ASA's Founders Award
. In 2006 he testified before Congress on climate science
, sharply criticizing the statistical methodology of Michael Mann's "hockey stick graph," which showed a sharp increase in global temperature in the last part of the 20th century. One section of Wegman's testimony concerned "social network analysis," and suggested that Mann's tightly-knit network of co-authors might have led to insufficiently aggressive peer review. USA Today reports that Wegman's testimony contained a substantial quantity of plagiarized material, and the peer-reviewed article derived from the testimony has been retracted by the journal that published it.
John Mashey has compiled an obsessively thorough catalogue of the plagiarized text
. (large .pdf.) [more inside]
100 years of world cuisine
is a statistical exploration of military conflict that is both artistic and disturbing.
Compiling the Absurd Box Scores from Space Jam
. Courtesy of The Harvard College Sports Analysis Collective.
The Monstars, behind a vicious defense and a quick-strike transition offense featuring the unprecedented 3-point-line dunk, seize early control and take a 66-18 lead going into the half. Pound (Barkley) and Bupkus (Ewing) are dominant. Things look grim for Jordan, Bugs Bunny and crew.
, a pioneer in the field of baseball statistics
, has now turned his attention to serial killers and their methods
Go figure: How to succeed in business by doing nothing
Article about variability in business and why it is sometimes better to do nothing.
"You're a dynamic business leader. Let's say you make widgets - though you might equally make big-budget Hollywood movies.
Your widgets, or your movies, vary. Some widgets are perfect, some a tad too long. Some movies make mega-bucks at the box office, some bomb.
So what do you do? Well, you're dynamic, so you react, of course. Something must be done. "
"Value-added modeling is promoted because it has the right pedigree -- because it is based on "sophisticated mathematics."
As a consequence, mathematics that ought to be used to illuminate ends up being used to intimidate." John Ewing, president of Math for America
and former executive director of the American Mathematical Society, criticizes the "value-added modeling" approach used as a proxy for teacher quality, most famously in a Los Angeles Times story
that called out low-scoring teachers by name. A Brookings Institution paper says value-added modeling is flawed but the best measure we have of teacher value
, arguing that the metric's wide fluctuations from year to year are no worse than those of batting averages in baseball. (Though the weakness of that correlation is mostly a BABIP issue
.) Can we assign a numerical value to teacher quality? If so, how?
Last week during the Senate budget negotiations, Sen. Jon Kyl (R-Ariz.)
, gave a speech that included the following statement: "If you want an abortion, you go to Planned Parenthood, and that’s well over 90 percent of what Planned Parenthood does.
" That statement is drastically different from the statistics reported by Planned Parenthood
, which list 90 percent of its services as preventive in nature, compared with 3 percent that are abortion-related. When asked about this apparent discrepancy, Jon Kyl's office replied that "his remark was not intended to be a factual statement
." And that is when things got noisy. [more inside]
Statistical hypothesis testing
with a p-value
of less than 0.05 is often used as a gold standard in science, and is required by peer reviewers and journals when stating results. Some statisticians argue that this indicates a cult of significance testing
using a frequentist
statistical framework that is counterintuitive and misunderstood
by many scientists. Biostatisticians have argued
that the (over)use of p-vaues come from "the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result" and identify several other problems with significance testing
. XKCD demonstrates
how misunderstandings of the nature of the p-value, failure to adjust for multiple comparisons
, and the file drawer problem
result in likely spurious conclusions being published in the scientific literature and then being distorted further in the popular press. You can simulate a similar situation yourself.
John Ioannidis uses problems with significance testing and other statistical concerns to argue, controversially, that "most published research findings are false
." Will the use of Bayes factors
replace classical hypothesis testing and p-values? Will something else?
According to new data released by the CDC yesterday,
more Americans are surviving cancer
thanks to advances in increased early detection and treatment
. CDC analysis shows an unprecedented 20% increase in survival rates between 2001 and 2007, which is nearly a quadruple increase since 1971
. [more inside]
The most typical person on the planet is a 28 year old Chinese man
. For now. [more inside]
The Most Shocking/Depressing/Enraging Interactive Infographic You Will See Today
unless you've been in the "top 10%" since 1969. Move the sliders for other interesting/surprising/sad perspectives into parts of the past century.
Disclaimer: "Income Growth" is just one data point of many regarding economic well-being in the USofA, some of which appear elsewhere in non-interactive form elsewhere on the site. Your personal mileage may vary. Remember, if you've done well despite not being in the "top 10%", then somebody else has done worse.
Stanford's Visualization Group
has produced a data cleanup web app called Wrangler
that works like straight up magic
Cracking the Scratchie.
With cheating and money laundering and statistics, this story seems like it should be about something more exciting than scratch-off lottery tickets. But it isn't.
2 0 1 0 a year in reviews
- This visualization renders a browsable, searchable distribution of all 2010 Pitchfork music reviews
Professors' global model forecasts civil unrest against governments
- With protests spreading
in the Middle East
- not on the list) I thought this article
on a forecast model
predicting "which countries
will likely experience an escalation in domestic political violence [within the next five years]" was rather interesting. [more inside]
The United States of Shame.
Surprisingly, Florida is not the oldest state. Unsurprisingly, Utah uses the most internet porn.
According to official Chinese stats, make of them what you will, there are now 457 million internet users in China
. They are said to include 450m who have broadband, and 303m who use mobile internet. 304m play online games, 140m use online banking, and 63m microblog. These users are estimated to spend an average of 18 hours a week online. As a benchmark, the current US population is estimated at 312m
Apparently They, of "They say..." fame, have been misusing the statistical significance test
. So much of what "They say" might not actually be. (via
"Nearly half of pregnancies among American women are unintended, and four in 10 of these are terminated by abortion.... At least half of American women will experience an unintended pregnancy by age 45, and, at current rates, about one-third will have had an abortion." Abortion is one of the most common medical procedures in the U.S., but it can be very difficult to get unbiased information about the procedure. From Jezebel: The Girl's Guide to Having an Abortion
Wins-above-replacement, or WAR
, is a Sabermetric term of art
for baseball player comparison. Fangraphs
, one of the go-to sites for baseball nerdlingers, now offers a way to make WAR grids
, an amazingly easily comprehended visual display comparing players based on WAR, sortable by team, position and season, with a default topline of player age. [more inside]
give their hopes and dreams for data, data tools and data science
Already, Google has provided Google Refine
) to help clean your datasets. While great visualizations
can be created with online tools
or by combining R (great posts previously
), with ggplot2
, and even Google Motion Charts With R
(already built into Google Spreadsheets
Need data? Needlebase
, helps non-programmers scrape, harvest, merge, and data from the web. Or if you’re introspective, Your Flowing Data
provide tools to measure and chart details of your own life.
Following the Journal of Personality and Social Psychology
's decision to publish Daryl Bem's writeup of 8 studies
(PDF) purporting to show evidence for precognition (previously
), researchers from the University of Amsterdam have written a rebuttal
(PDF) which finds methodological flaws not only in Bem's research, but in many other published papers in experimental psychology. Could this prove to be psychology's cold fusion moment
? [more inside]
Google is known to ask the following question in job interviews: In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?
Think you know the answer? If so, Steve Landsburg may be willing to bet you up to $5000. [more inside]
"Normal" human pregnancies last 40 weeks, right? Well, no; they can vary quite a bit by the mother's race
, number of previous children
, family history of delivering early or late
, home state
, work habits
, and even the fetus' HLA type
. So where does that "40 week" thing come from? Oh, dear.
So check out this super-nerdy pregnancy statistics
website, from an engineer mom who is collecting data from the public
(see the raw data
and auto-generated graphs
, and read the FAQ about the survey, with more cool graphs
). Looking for day-by-day
probabilities on when that baby's due? This would be your stats table with daily prediction
(adjust dates at top of page as needed). Of course, you could always shut up your constantly inquiring relatives and friends another way
The New York Times presents an interactive map of America's population
separated by race, income, and education, according to census data from 2005 to 2009. One dot for every 50 people. (Previously
) [more inside]
Measure-theoretic probability: Why it should be learnt and how to get started.
The clickable chart of distribution relationships.
Just two of the interesting and informative probability resources I've learned about, along with countless other tidbits of information, from statistician John D. Cook
and his probability fact-of-the-day Twitter feed ProbFact
. John also has daily tip and fact Twitter feeds for Windows keyboard shortcuts
, regular expressions
, TeX and LaTeX
, algebra and number theory
, topology and geometry
, real and complex analysis
, and beginning tomorrow, computer science