"I amused myself for over a year thinking about the impacts of different toilet seat administration policies and how to measure them – doing calculations in my head, considering ratios of Standing events to Sitting events, and I slowly began to understand some of the specific differences in the basic policies that know to be administered most often. Finally, I decided to perform a probabilistic analysis". Essential Toilet Seat Analytics.
Predicting Google Shutdowns. "In the following essay, I collect data on 350 Google products and look for predictive variables. I find some while modeling shutdown patterns, and make some predictions about future shutdowns. Hopefully the results are interesting, useful, or both." Gwern exhaustively analyzes Google products past and present with an eye to establishing what's not long for the bitverse. tl;dr? Results.
291 diseases and injuries + 67 risk factors + 1,160 non-fatal complications = 650 million estimates of how we age, sicken, and die
As humans live longer, what ails us isn't necessarily what kills us: five data visualizations of how we age, sicken, and die. Causes of death by age, sex, region, and year. Heat map of leading causes and risks by region. Changes in leading causes and risks between 1990 and 2010. Healthy years lost to disability vs. life expectancy in 1990 and 2010. Uncertainties of causes and risks. From the team for the massive Institute for Health Metrics and Evaluation Global Burden of Diseases, Injuries, and Risk Factors Study 2010. [more inside]
A Vast Left-Wing Competency: "How Democrats became the party of effective campaigning — and why the GOP isn’t catching up anytime soon." Sasha Issenberg, author of The Victory Lab, has been writing a series of posts on Slate that focus on different aspects of "the new science of winning campaigns". [more inside]
A corpus analysis of rock harmony [PDF] - The analyses were encoded using a recursive notation, similar to a context-free grammar, allowing repeating sections to be encoded succinctly. The aggregate data was then subjected to a variety of statistical analyses. We examined the frequency of different chords and chord transitions ... Other results concern the frequency of different root motions, patterns of co-occurrence between chords, and changes in harmonic practice across time. More information, analysis, and explanation here.
On not reading books. Franco Moretti, author of the controversial Graphs, Maps, Trees: Abstract Models for a Literary History, proposes that literary study needs to abandon "close reading" for "distant reading": "understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data." He is co-founder of the Stanford Literary Lab, where he and like-minded colleagues have published studies on programming computers to use statistical analysis to identify a novel's genre(PDF) and analyzing plots as networks(PDF). Similar projects are on the way.
It has applications in health care, pharmaceuticals, facial recognition, economics/related areas, and of course, much much more. Previously, MeFi discussed controversial homeland security applications, and the nexus between social networking and mobile devices that further contributes to the pool. With plenty to dig into, let's talk Data Mining in more detail. [more inside]
According to new data released by the CDC yesterday, more Americans are surviving cancer thanks to advances in increased early detection and treatment. CDC analysis shows an unprecedented 20% increase in survival rates between 2001 and 2007, which is nearly a quadruple increase since 1971. [more inside]
Measure-theoretic probability: Why it should be learnt and how to get started. The clickable chart of distribution relationships. Just two of the interesting and informative probability resources I've learned about, along with countless other tidbits of information, from statistician John D. Cook's blog and his probability fact-of-the-day Twitter feed ProbFact. John also has daily tip and fact Twitter feeds for Windows keyboard shortcuts, regular expressions, TeX and LaTeX, algebra and number theory, topology and geometry, real and complex analysis, and beginning tomorrow, computer science and statistics.
It has applications in Economics, Biology, Pharmaceuticals, and is rooted in State Space Modeling, which with Kalman Filtering (paper, breakdown [warning: long]) was used in the Apollo program. Dynamic Linear Models are gaining in popularity. There exists an R package, and both a short doc and a really great (read: worth buying) book (sorry, not a download, but here's chapter 2) by Giovanni Petris, Sonia Petrone, and Patrizia Campagnoli with its own little website.
This is a way nerdy analysis of the cost of shopping at drugstores vs. Wal-Mart vs. the gas required to get to them.
Now this is what you call an alpha nerd. I remember this guy from his inspiringly, excruciatingly detailed analysis of various routes he took into work, collecting data over a year.
The Football Prospectus is up and running. The good folks who work on the Baseball Prospectus have turned their attention to NFL. This is their inaugural effort. Their contrarian thinking and in-depth statistical analysis has (slowly) started to creep its way into MLB coverage. Can their unique take and historical perspective change football's conventional wisdom as well?