You are more likely to be killed by a pig than a shark. You run a greater risk of dying from an asteroid impact than a terrorist attack. You would have to fly an average of 38,000 years in commercial aviation before suffering a fatal crash. The fears parents have for their children have nothing in common to what will actually kill or hurt them. Our perception of risk has very little relation to threat: some helpful visual guides [PDF] and reasons why.
It's a simple concept: Given a choice between two random movies, which one do you like best? That's the driving force behind Flickchart, an addictive review site for movie lovers. Faced with two posters, click the one for the title you prefer (weeding out the ones you haven't seen). Good! Now do it again. And again. And again. With each new face-off, Flickchart perfects a growing list of your favorite films -- and there can be no ties. This leads to some difficult dilemmas: Star Wars or Raiders of the Lost Ark? Citizen Kane or The Godfather? WALL-E or Spirited Away? But you needn't struggle alone -- Flickchart is also social. By drawing on the data of tens of thousands of fellow users, you can create remarkably specific lists: Martin Scorsese's Best Period Films. The Best Road Movies of the 1980s. The Worst Movies of All Time. If you rank enough films, you can generate interesting personalized charts, like "Your Favorite Musicals" or "The Best Movies You Haven't Seen." These filters carry over to the ranking system, letting you judge nothing but Horror movies or 1960s movies or unranked movies or movies from your top 100. You can also comment on popular match-ups, lending your voice to contentious debates like Ghostbusters vs. Back to the Future or Jaws vs. Predator. Not a movie fan? Don't worry. Flickchart will be expanding into books, games, and music soon. Until then, you can give your own data sets the Flickchart treatment using this tool from CNN. [more inside]
In an attempt to make sense of the 6.4 million words that comprise the more than 573.000 paged lines in the wikileaks 9/11 pager intercept data, researchers Mitja Back, Albrecht Kuefner, and Boris Egloff from the Johannes Gutenberg University in Mainz, have now conducted a statistical analysis of the emotional content of these pages.
Do we live in a world where there is magic and meaning, or is it all just chance? Radiolab meets two young women who share a nearly unbelievable story of coincidence and fate. Then they consult with statisticians for a very different take on the same story. This short audio documentary is charming and delightful. A Lucky Wind won a Best Documentary: Honorable Mention Award in the 2009 Third Coast / Richard H. Driehaus Foundation Competition as well as the 2009 AAAS Kavli Science Journalism Award (Radio Documentary). [more inside]
FiveThirtyEight.com is no more! Long Live Five Thirty Eight! Independent political statistics blog FiveThirtyEight.com has been absorbed by the New York Times. Nate Silver, the stats genius, baseball freak and predictor of 49 of 50 states in the last presidential election began his blog on DailyKos. As of this morning, the blog has moved to the New York Times. [more inside]
OK Cupid statistics fun: We collected 552,000 example user pictures. We paired them up and asked people to make snap judgments. Here's what we found.
Interested in teaching yourself some statistics? Here is an excellent online and interactive statistics textbook developed at UC Berkeley, and also used at CUNY, UCSC, SJSU, and Bard. Here is the syllabus for the course at Berkeley. And here are some insightful reflections from the professor on developing Berkeley's first fully approved online course.
It has applications in Economics, Biology, Pharmaceuticals, and is rooted in State Space Modeling, which with Kalman Filtering (paper, breakdown [warning: long]) was used in the Apollo program. Dynamic Linear Models are gaining in popularity. There exists an R package, and both a short doc and a really great (read: worth buying) book (sorry, not a download, but here's chapter 2) by Giovanni Petris, Sonia Petrone, and Patrizia Campagnoli with its own little website.
For the past year and a half, Daily Kos has been running weekly polls from the respected polling firm, Research 2000. Earlier this month, former Daily Kos diarist Nate Silver of Five Thirty Eight published a rating of pollsters that placed R2k near the bottom, leading Markos to fire R2K. Today, Markos alleges that R2K committed fraud, publishing a study of their results by independent statisticians. He promises to sue.
R is powerful, but tricky. RKWard is an UI that make R user friendly or, at least, make it more similar to SPSS or Stata. Screenshots!
Veronique de Rugy, NRO contributor and George Mason fellow, says her research indicates that stimulus funding was disproportionately directed towards Democratic congressional districts. Nate Silver begs to disagree. De Rugy responds here; Silver responds here. Others say that this is a model "for the quick, effective peer-review that the internet facilitates." Perhaps this is a new model for peer review?
Significantly what?...Or how our most common statistical methods really weren't meant to be used that way and why that study result is likely spurious. Since mefites like to argue about stats, here's some background for us all (and I'm not talking correlation vs causation)!
DNA’s Dirty Little Secret: A forensic tool renowned for exonerating the innocent may actually be putting them in prison.
R is quickly becoming the programming language for data analysis and statistics. R (an implementation of S) is free, open-source, and has hundreds of packages available. You can use it on the command-line, through a GUI, or in your favorite text editor. Use it with Python, Perl, or Java. Sweave R code into LaTeX documents for reproducible research. [more inside]
Mercenary Epidemiology: Data Reanalysis and Reinterpretation for Sponsors With Financial Interest in the Outcome. (.pdf link) When should scientists be required to release their raw data for (potentially hostile) re-analysis? A letter to the editors of Annals of Epidemiology from David Michaels, Ph.D., MPH, public health blogger, author of the book Doubt Is Their Product, and, as of December 2009, the Assistant Secretary of Labor for OSHA, unanimously confirmed by the Senate despite the dismay of some. Michaels interviewed at Science Progress about Doubt Is Their Product (podcast, with transcript.)
Researcher uses data regarding connections on facebook to map distinct regions of the United States.
"We’ve processed the messaging habits of almost a million people and are about to basically prove that, despite what you might’ve heard from the Obama campaign and organic cereal commercials, racism is alive and well." The people who run the dating site OkCupid continue to analyze the aggregate data of their users, shedding light on preferences and behavior. The most recent OkTrends post takes a look at their compiled racial data: Your Race Affects Whether People Write You Back. (previously 1 2)
How (not) to write an online-dating message, based on a sample of 500,000 "first contact" messages. [more inside]
"Death Risk Rankings calculates your risk of dying in the next year and allows you to compare that risk to others in the world." Fun with mortality data and statistics from Carnegie Mellon University.
C0nc0rdance [sytl] asks; How far should we trust common sense? A less than 9 min video on Common Sense as it relates to Science. Enjoy.
What if we condensed the UK into a village of 100 people? The Independent experiment with demographics.
The fine folks at OkCupid, the dating site, have begun to analyze aggregate data from the questions their users answer to form dating profiles, revealing, among other things, that users in Nevada are more open to rape fanstasies than those from Michigan. [more inside]
Andrew Gelman recently posted this strange trend in baby naming originally posted on Laura Wattenberg's blog in 2007. Why do so many boys' names now end with the letter "n"?
Mariano's Gonna Cut You, and other stat-and-graph filled baseball analysis from Beyond the Boxscore. [more inside]
Wikirank is an analytical tool that measures the popularity of trending topics on wikipedia. You can compare up to four topics and generate nifty embeddable graphs.
He predicted a losing season for the White Sox in 2007 and foresaw that the Tampa Bay Rays would be the best team in the American League in 2008, although he wrongly predicted that the Rays would win the World Series. He also predicted Obama's 6-point victory over McCain. Now the stats guru Nate Silver is picking the Oscar winners and predicting an upset win for Taraji P. Henson in the Best Supporting Actress category.
Digital Research Tools (DiRT) is a wiki created by Lisa Spiro, director of Rice University's Digital Media Center. Tons of "snapshot reviews of software that can help researchers" are categorized by what you're trying to accomplish ("Analyze Statistics," "Network With Other Researchers," "Search Visually"), as well as by general topic ("Authoring," "Linguistic Tools," "Text Analysis"). Via
Special 3-page edition of Harper’s Index: A retrospective of the Bush era.
Of all the People in the World "uses grains of rice to bring formally abstract statistics to startling and powerful life" . via
THE FOURTH QUADRANT: A MAP OF THE LIMITS OF STATISTICS by Nassim Nicholas Taleb. "In the following Edge original essay, Taleb continues his examination of Black Swans, the highly improbable and unpredictable events that have massive impact. He claims that those who are putting society at risk are "no true statisticians", merely people using statistics either without understanding them, or in a self-serving manner.
Graph your life at MIT's Mycrocosm. Simple interface. Interesting potential. Worrying about. Freelance: No Idea What the Hell Is Going On. Food and Liquid Consumption. Also allows for sharing datasets with other users.
BBC News is running a weekly ongoing series of articles that describe and illustrate common misconceptions (and manipulations) of statistics using examples from the news and ads.
Lesson 1: surveys. Lesson 2: counting. Lesson 3: percentage. Lesson 4: averages. Lesson 5: causation.
Lesson 1: surveys. Lesson 2: counting. Lesson 3: percentage. Lesson 4: averages. Lesson 5: causation.
Want to know how government spending and taxation levels have gone up or down over the last 20 years, and how they compare with other countries? The Organization for Economic Cooperation and Development has a handy set of tables (Excel, HTML-ized by Google): total spending, total revenues, fiscal surplus or deficit (Norway's surplus is 17% of GDP). Part of the statistical tables for the semi-annual OECD Outlook.
"Hard Numbers: The Economy is Worse than You Know" [full article for Harper's subscribers, a different abridged version] discusses how the Consumer Price Index and other US economic statistics have been manipulated over time. Among other things, the article claims, these changes make Social Security checks 70% lower than they would otherwise be. [more inside]
Statistics compiled by State Senator Eliot Shapleigh in the state's annual ranking, entitled "Texas on the Brink" report dreary news in just about all categories used to characterize standards of living, from education to health to enfranchisement. [more inside]
Steroids, "Other Drugs," and Baseball: a Voice of Scepticism on the Impact of Steroids on Major League Baseball. Eric Walker suggests a "juiced" ball made much more of an effect than PEDs.
Election poll fatigue? Diversify your daily dose of stats with What Japan Thinks. Check out Japan's favorite emoticons, thoughts on drinking vinegar, and of course awwcats. [more inside]
Dutch nurse Lucia De Berk has had her case reopened 5 years after her conviction for multiple counts of murdering her patients. [more inside]
TheDataWeb - a network of online data libraries on topics including census data, economic data, health data, income and unemployment data, population data, labor data, cancer data, crime and transportation data, family dynamics, vital statistics data