Uncovering Big Bias with Big Data, by David Colarusso - "What follows is the story of how I used those cases to discover what best predicts defendant outcomes: race or income. This post is not a summary of my findings, though you will find them in this article. It is a look behind the curtain of data science, a how to cast as case study. Yes, there will be a few equations. But you can safely skim over them without missing much. Just pay particular attention to the graphs." [more inside]
“Hindus are, on average, richer and more educated than Muslims. But oddly, the child mortality rate for Hindus is much higher. All observable factors say Hindus should fare better, but they don't. Economists refer to this as the Muslim mortality puzzle. In a new study, researchers believe that they may have found a solution to the puzzle. And, surprisingly, the solution lies in a single factor – open defecation.” [more inside]
291 diseases and injuries + 67 risk factors + 1,160 non-fatal complications = 650 million estimates of how we age, sicken, and die
As humans live longer, what ails us isn't necessarily what kills us: five data visualizations of how we age, sicken, and die. Causes of death by age, sex, region, and year. Heat map of leading causes and risks by region. Changes in leading causes and risks between 1990 and 2010. Healthy years lost to disability vs. life expectancy in 1990 and 2010. Uncertainties of causes and risks. From the team for the massive Institute for Health Metrics and Evaluation Global Burden of Diseases, Injuries, and Risk Factors Study 2010. [more inside]
Culturomics, an emerging field of study that applies climate-modeling levels of supercomputing power towards predicting human behavior, using "computerized analysis of vast digital book archives, offering novel insights into the functioning of human society", has been tried with digital news archives. [more inside]
Apparently They, of "They say..." fame, have been misusing the statistical significance test. So much of what "They say" might not actually be. (via)
Data Analysis for Politics and Policy was written by Edward Tufte in 1974. It's an introduction to basic data analysis techniques with examples taken from the social sciences. The book is written in a friendly, conversational tone and is only 179 pages long. The chapter explaining simple linear regression is the heart of the book.
R is quickly becoming the programming language for data analysis and statistics. R (an implementation of S) is free, open-source, and has hundreds of packages available. You can use it on the command-line, through a GUI, or in your favorite text editor. Use it with Python, Perl, or Java. Sweave R code into LaTeX documents for reproducible research. [more inside]
Spacehack "A directory of ways to participate in space exploration. Interact and connect with the space community."
The memespread project. How does a meme spread? What part does MetaFilter play in the process? [via waxy.org]