FiveThirtyEight.com is no more! Long Live Five Thirty Eight! Independent political statistics blog FiveThirtyEight.com has been absorbed by the New York Times. Nate Silver, the stats genius, baseball freak and predictor of 49 of 50 states in the last presidential election began his blog on DailyKos. As of this morning, the blog has moved to the New York Times. [more inside]
OK Cupid statistics fun: We collected 552,000 example user pictures. We paired them up and asked people to make snap judgments. Here's what we found.
Interested in teaching yourself some statistics? Here is an excellent online and interactive statistics textbook developed at UC Berkeley, and also used at CUNY, UCSC, SJSU, and Bard. Here is the syllabus for the course at Berkeley. And here are some insightful reflections from the professor on developing Berkeley's first fully approved online course.
It has applications in Economics, Biology, Pharmaceuticals, and is rooted in State Space Modeling, which with Kalman Filtering (paper, breakdown [warning: long]) was used in the Apollo program. Dynamic Linear Models are gaining in popularity. There exists an R package, and both a short doc and a really great (read: worth buying) book (sorry, not a download, but here's chapter 2) by Giovanni Petris, Sonia Petrone, and Patrizia Campagnoli with its own little website.
For the past year and a half, Daily Kos has been running weekly polls from the respected polling firm, Research 2000. Earlier this month, former Daily Kos diarist Nate Silver of Five Thirty Eight published a rating of pollsters that placed R2k near the bottom, leading Markos to fire R2K. Today, Markos alleges that R2K committed fraud, publishing a study of their results by independent statisticians. He promises to sue.
R is powerful, but tricky. RKWard is an UI that make R user friendly or, at least, make it more similar to SPSS or Stata. Screenshots!
Veronique de Rugy, NRO contributor and George Mason fellow, says her research indicates that stimulus funding was disproportionately directed towards Democratic congressional districts. Nate Silver begs to disagree. De Rugy responds here; Silver responds here. Others say that this is a model "for the quick, effective peer-review that the internet facilitates." Perhaps this is a new model for peer review?
Significantly what?...Or how our most common statistical methods really weren't meant to be used that way and why that study result is likely spurious. Since mefites like to argue about stats, here's some background for us all (and I'm not talking correlation vs causation)!
DNA’s Dirty Little Secret: A forensic tool renowned for exonerating the innocent may actually be putting them in prison.
R is quickly becoming the programming language for data analysis and statistics. R (an implementation of S) is free, open-source, and has hundreds of packages available. You can use it on the command-line, through a GUI, or in your favorite text editor. Use it with Python, Perl, or Java. Sweave R code into LaTeX documents for reproducible research. [more inside]
Mercenary Epidemiology: Data Reanalysis and Reinterpretation for Sponsors With Financial Interest in the Outcome. (.pdf link) When should scientists be required to release their raw data for (potentially hostile) re-analysis? A letter to the editors of Annals of Epidemiology from David Michaels, Ph.D., MPH, public health blogger, author of the book Doubt Is Their Product, and, as of December 2009, the Assistant Secretary of Labor for OSHA, unanimously confirmed by the Senate despite the dismay of some. Michaels interviewed at Science Progress about Doubt Is Their Product (podcast, with transcript.)
Researcher uses data regarding connections on facebook to map distinct regions of the United States.
"We’ve processed the messaging habits of almost a million people and are about to basically prove that, despite what you might’ve heard from the Obama campaign and organic cereal commercials, racism is alive and well." The people who run the dating site OkCupid continue to analyze the aggregate data of their users, shedding light on preferences and behavior. The most recent OkTrends post takes a look at their compiled racial data: Your Race Affects Whether People Write You Back. (previously 1 2)
How (not) to write an online-dating message, based on a sample of 500,000 "first contact" messages. [more inside]
"Death Risk Rankings calculates your risk of dying in the next year and allows you to compare that risk to others in the world." Fun with mortality data and statistics from Carnegie Mellon University.
C0nc0rdance [sytl] asks; How far should we trust common sense? A less than 9 min video on Common Sense as it relates to Science. Enjoy.
What if we condensed the UK into a village of 100 people? The Independent experiment with demographics.
The fine folks at OkCupid, the dating site, have begun to analyze aggregate data from the questions their users answer to form dating profiles, revealing, among other things, that users in Nevada are more open to rape fanstasies than those from Michigan. [more inside]
Andrew Gelman recently posted this strange trend in baby naming originally posted on Laura Wattenberg's blog in 2007. Why do so many boys' names now end with the letter "n"?
Mariano's Gonna Cut You, and other stat-and-graph filled baseball analysis from Beyond the Boxscore. [more inside]
Wikirank is an analytical tool that measures the popularity of trending topics on wikipedia. You can compare up to four topics and generate nifty embeddable graphs.
He predicted a losing season for the White Sox in 2007 and foresaw that the Tampa Bay Rays would be the best team in the American League in 2008, although he wrongly predicted that the Rays would win the World Series. He also predicted Obama's 6-point victory over McCain. Now the stats guru Nate Silver is picking the Oscar winners and predicting an upset win for Taraji P. Henson in the Best Supporting Actress category.
Digital Research Tools (DiRT) is a wiki created by Lisa Spiro, director of Rice University's Digital Media Center. Tons of "snapshot reviews of software that can help researchers" are categorized by what you're trying to accomplish ("Analyze Statistics," "Network With Other Researchers," "Search Visually"), as well as by general topic ("Authoring," "Linguistic Tools," "Text Analysis"). Via
Special 3-page edition of Harper’s Index: A retrospective of the Bush era.
Of all the People in the World "uses grains of rice to bring formally abstract statistics to startling and powerful life" . via
THE FOURTH QUADRANT: A MAP OF THE LIMITS OF STATISTICS by Nassim Nicholas Taleb. "In the following Edge original essay, Taleb continues his examination of Black Swans, the highly improbable and unpredictable events that have massive impact. He claims that those who are putting society at risk are "no true statisticians", merely people using statistics either without understanding them, or in a self-serving manner.
Graph your life at MIT's Mycrocosm. Simple interface. Interesting potential. Worrying about. Freelance: No Idea What the Hell Is Going On. Food and Liquid Consumption. Also allows for sharing datasets with other users.
BBC News is running a weekly ongoing series of articles that describe and illustrate common misconceptions (and manipulations) of statistics using examples from the news and ads.
Lesson 1: surveys. Lesson 2: counting. Lesson 3: percentage. Lesson 4: averages. Lesson 5: causation.
Lesson 1: surveys. Lesson 2: counting. Lesson 3: percentage. Lesson 4: averages. Lesson 5: causation.
Want to know how government spending and taxation levels have gone up or down over the last 20 years, and how they compare with other countries? The Organization for Economic Cooperation and Development has a handy set of tables (Excel, HTML-ized by Google): total spending, total revenues, fiscal surplus or deficit (Norway's surplus is 17% of GDP). Part of the statistical tables for the semi-annual OECD Outlook.
"Hard Numbers: The Economy is Worse than You Know" [full article for Harper's subscribers, a different abridged version] discusses how the Consumer Price Index and other US economic statistics have been manipulated over time. Among other things, the article claims, these changes make Social Security checks 70% lower than they would otherwise be. [more inside]
Statistics compiled by State Senator Eliot Shapleigh in the state's annual ranking, entitled "Texas on the Brink" report dreary news in just about all categories used to characterize standards of living, from education to health to enfranchisement. [more inside]
Steroids, "Other Drugs," and Baseball: a Voice of Scepticism on the Impact of Steroids on Major League Baseball. Eric Walker suggests a "juiced" ball made much more of an effect than PEDs.
Election poll fatigue? Diversify your daily dose of stats with What Japan Thinks. Check out Japan's favorite emoticons, thoughts on drinking vinegar, and of course awwcats. [more inside]
Dutch nurse Lucia De Berk has had her case reopened 5 years after her conviction for multiple counts of murdering her patients. [more inside]
TheDataWeb - a network of online data libraries on topics including census data, economic data, health data, income and unemployment data, population data, labor data, cancer data, crime and transportation data, family dynamics, vital statistics data
The new UN Human Development Report is out. Lots of interesting stuff on climate change. But for me, nothing beats the Human Development Index, a number that means different things to different people.
How depressing is your job? The Office of Applied Studies, a division of the U.S. Department of Health & Human Services, released a report ranking various occupations in order of the number of depressive episodes experienced by workers. "Personal Care & Service" occupations (defined by the Department of Labor's Bureau of Labor Statistics here) top the list. One wonders if these are the occupations contributing to the growth of the so-called "service economy," and if so, are we heading for a deepening national malaise?
Brad Laidman critiques the findings from the Centre For Public Health at Liverpool John Moore University report [pdf] 'Elvis to Eminem: quantifying the price of fame through early mortality of European and North American rock and pop stars.' [more inside]
National Center for Health Statistics says the median for heterosexual men is seven partners and for heterosexual women it’s four, and isn't that just plain BS? The New York Times has looked at the NCHS study and found The Myth, the Math, the Sex while Salon explained that Chaste women + promiscuous men = impossible. Now Janet W. Hardy, the author of The Ethical Slut, reminds us either men are rounding up or women are rounding down. Then she does something radical by admitting that she has honestly, depending on your definition, perhaps hundreds of lovers. Is either gender ready to be honest about sex? For instance, do hookers or blackout sex even count?
What's the fewest number of pitches pitched in a complete game? How many times has a relieving pitcher been awarded a win without even facing a batter? How many different pitchers has Julio Franco faced? What's the greatest number of hits in a game where all of them are home runs? Who's hit the most grand slams in the ninth or extra innings? These questions and many (many) more at Baseball-reference.com's fantastic Stat of the Day blog.