Professors' global model forecasts civil unrest against governments - With protests spreading in the Middle East (now Yemen - not on the list) I thought this article and blog on a forecast model predicting "which countries will likely experience an escalation in domestic political violence [within the next five years]" was rather interesting. [more inside]
The United States of Shame. Surprisingly, Florida is not the oldest state. Unsurprisingly, Utah uses the most internet porn.
According to official Chinese stats, make of them what you will, there are now 457 million internet users in China. They are said to include 450m who have broadband, and 303m who use mobile internet. 304m play online games, 140m use online banking, and 63m microblog. These users are estimated to spend an average of 18 hours a week online. As a benchmark, the current US population is estimated at 312m.
Apparently They, of "They say..." fame, have been misusing the statistical significance test. So much of what "They say" might not actually be. (via)
"Nearly half of pregnancies among American women are unintended, and four in 10 of these are terminated by abortion.... At least half of American women will experience an unintended pregnancy by age 45, and, at current rates, about one-third will have had an abortion." Abortion is one of the most common medical procedures in the U.S., but it can be very difficult to get unbiased information about the procedure. From Jezebel: The Girl's Guide to Having an Abortion.
Wins-above-replacement, or WAR, is a Sabermetric term of art for baseball player comparison. Fangraphs, one of the go-to sites for baseball nerdlingers, now offers a way to make WAR grids, an amazingly easily comprehended visual display comparing players based on WAR, sortable by team, position and season, with a default topline of player age. [more inside]
Dataists give their hopes and dreams for data, data tools and data science in 2011. Already, Google has provided Google Refine (previously) to help clean your datasets. While great visualizations can be created with online tools or by combining R (great posts previously), with ggplot2, GGobi, and even Google Motion Charts With R (already built into Google Spreadsheets). Need data? Needlebase, helps non-programmers scrape, harvest, merge, and data from the web. Or if you’re introspective, Your Flowing Data and Daytum provide tools to measure and chart details of your own life.
Following the Journal of Personality and Social Psychology's decision to publish Daryl Bem's writeup of 8 studies (PDF) purporting to show evidence for precognition (previously), researchers from the University of Amsterdam have written a rebuttal (PDF) which finds methodological flaws not only in Bem's research, but in many other published papers in experimental psychology. Could this prove to be psychology's cold fusion moment? [more inside]
Google is known to ask the following question in job interviews: In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country? Think you know the answer? If so, Steve Landsburg may be willing to bet you up to $5000. [more inside]
"Normal" human pregnancies last 40 weeks, right? Well, no; they can vary quite a bit by the mother's race, age, number of previous children, family history of delivering early or late, home state, work habits, and even the fetus' HLA type. So where does that "40 week" thing come from? Oh, dear. So check out this super-nerdy pregnancy statistics website, from an engineer mom who is collecting data from the public (see the raw data and auto-generated graphs, and read the FAQ about the survey, with more cool graphs). Looking for day-by-day probabilities on when that baby's due? This would be your stats table with daily prediction (adjust dates at top of page as needed). Of course, you could always shut up your constantly inquiring relatives and friends another way.
The New York Times presents an interactive map of America's population separated by race, income, and education, according to census data from 2005 to 2009. One dot for every 50 people. (Previously) [more inside]
Measure-theoretic probability: Why it should be learnt and how to get started. The clickable chart of distribution relationships. Just two of the interesting and informative probability resources I've learned about, along with countless other tidbits of information, from statistician John D. Cook's blog and his probability fact-of-the-day Twitter feed ProbFact. John also has daily tip and fact Twitter feeds for Windows keyboard shortcuts, regular expressions, TeX and LaTeX, algebra and number theory, topology and geometry, real and complex analysis, and beginning tomorrow, computer science and statistics.
Hans Rosling [previously, previously] compares the health and wealth of 200 countries over 200 years in 4 minutes using the best infographic ever. Interactive Flash version here.
Kaggle hosts competitions to glean information from massive data sets, a la the Netflix Prize. Competitors can enter free, while companies with vast stores of impenetrable data pay Kaggle to outsource their difficulties to the world population of freelance data-miners. Kaggle contestants have already developed dozens of chess rating systems which outperform the Elo rating currently in use, and identified genetic markers in HIV associated with a rise in viral load. Right now, you can compete to forecast tourism statistics or predict unknown edges in a social network. Teachers who want to pit their students against each other can host a Kaggle contest free of charge.
An examination of the differences between the literary and scientific cultures, by John Allen Poulos.
20.10.2010 is World Statistics Day, so help yourself to a metric (haha sorry) ton of publicly available data at UNdata, ICSPR (registration required to download data sets), and data.gov (previously). You can also explore, visualize and animate a variety of publicly available data sets with Google Labs' Public Data Explorer.
'Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong.' Dr. John P. A. Ioannidis, adjunct professor at Tufts University School of Medicine is a meta-researcher. 'He and his team have shown, again and again, and in many different ways, that much of what biomedical researchers conclude in published studies—conclusions that doctors keep in mind when they prescribe antibiotics or blood-pressure medication, or when they advise us to consume more fiber or less meat, or when they recommend surgery for heart disease or back pain—is misleading, exaggerated, and often flat-out wrong. He charges that as much as 90 percent of the published medical information that doctors rely on is flawed. His work has been widely accepted by the medical community; it has been published in the field’s top journals, where it is heavily cited; and he is a big draw at conferences.' [more inside]
The Center for Sexual Health Promotion, Indiana University, has investigated in 2009 sexual practices in the USA. The results are reported in this month's Special Issue of the Journal of Sexual Medicine. (The full text is available behind a short anonymous online survey.) [more inside]
A team of researchers at Iowa State University has found that a murder costs more than $17.25 Million to society. [via]
A Tour through the Visualization Zoo. A survey of powerful visualization techniques, from the obvious to the obscure.
After five years of number-crunching and methodological controversy, the NRC's rankings of US graduate programs were released today, three years after the target date and fifteen since the previous ranking. Peruse the results at phds.org. Instead of numerical ratings, the NRC released two rankings, the "R-ranking" and the "S-ranking", each one with a wide error bar around it. Confused yet? Brian Leiter thinks the philosophy rankings "qualify as somewhere between "odd" and "inexplicable."" The University of Washington's CS department says their ranking of 15-32 is "clearly erroneous." Obviously, the only appropriate response is to compute asymptotic formulae for the number of possible fuzzy rankings.
You are more likely to be killed by a pig than a shark. You run a greater risk of dying from an asteroid impact than a terrorist attack. You would have to fly an average of 38,000 years in commercial aviation before suffering a fatal crash. The fears parents have for their children have nothing in common to what will actually kill or hurt them. Our perception of risk has very little relation to threat: some helpful visual guides [PDF] and reasons why.
It's a simple concept: Given a choice between two random movies, which one do you like best? That's the driving force behind Flickchart, an addictive review site for movie lovers. Faced with two posters, click the one for the title you prefer (weeding out the ones you haven't seen). Good! Now do it again. And again. And again. With each new face-off, Flickchart perfects a growing list of your favorite films -- and there can be no ties. This leads to some difficult dilemmas: Star Wars or Raiders of the Lost Ark? Citizen Kane or The Godfather? WALL-E or Spirited Away? But you needn't struggle alone -- Flickchart is also social. By drawing on the data of tens of thousands of fellow users, you can create remarkably specific lists: Martin Scorsese's Best Period Films. The Best Road Movies of the 1980s. The Worst Movies of All Time. If you rank enough films, you can generate interesting personalized charts, like "Your Favorite Musicals" or "The Best Movies You Haven't Seen." These filters carry over to the ranking system, letting you judge nothing but Horror movies or 1960s movies or unranked movies or movies from your top 100. You can also comment on popular match-ups, lending your voice to contentious debates like Ghostbusters vs. Back to the Future or Jaws vs. Predator. Not a movie fan? Don't worry. Flickchart will be expanding into books, games, and music soon. Until then, you can give your own data sets the Flickchart treatment using this tool from CNN. [more inside]
In an attempt to make sense of the 6.4 million words that comprise the more than 573.000 paged lines in the wikileaks 9/11 pager intercept data, researchers Mitja Back, Albrecht Kuefner, and Boris Egloff from the Johannes Gutenberg University in Mainz, have now conducted a statistical analysis of the emotional content of these pages.
Do we live in a world where there is magic and meaning, or is it all just chance? Radiolab meets two young women who share a nearly unbelievable story of coincidence and fate. Then they consult with statisticians for a very different take on the same story. This short audio documentary is charming and delightful. A Lucky Wind won a Best Documentary: Honorable Mention Award in the 2009 Third Coast / Richard H. Driehaus Foundation Competition as well as the 2009 AAAS Kavli Science Journalism Award (Radio Documentary). [more inside]
FiveThirtyEight.com is no more! Long Live Five Thirty Eight! Independent political statistics blog FiveThirtyEight.com has been absorbed by the New York Times. Nate Silver, the stats genius, baseball freak and predictor of 49 of 50 states in the last presidential election began his blog on DailyKos. As of this morning, the blog has moved to the New York Times. [more inside]
OK Cupid statistics fun: We collected 552,000 example user pictures. We paired them up and asked people to make snap judgments. Here's what we found.
Interested in teaching yourself some statistics? Here is an excellent online and interactive statistics textbook developed at UC Berkeley, and also used at CUNY, UCSC, SJSU, and Bard. Here is the syllabus for the course at Berkeley. And here are some insightful reflections from the professor on developing Berkeley's first fully approved online course.
It has applications in Economics, Biology, Pharmaceuticals, and is rooted in State Space Modeling, which with Kalman Filtering (paper, breakdown [warning: long]) was used in the Apollo program. Dynamic Linear Models are gaining in popularity. There exists an R package, and both a short doc and a really great (read: worth buying) book (sorry, not a download, but here's chapter 2) by Giovanni Petris, Sonia Petrone, and Patrizia Campagnoli with its own little website.
For the past year and a half, Daily Kos has been running weekly polls from the respected polling firm, Research 2000. Earlier this month, former Daily Kos diarist Nate Silver of Five Thirty Eight published a rating of pollsters that placed R2k near the bottom, leading Markos to fire R2K. Today, Markos alleges that R2K committed fraud, publishing a study of their results by independent statisticians. He promises to sue.
R is powerful, but tricky. RKWard is an UI that make R user friendly or, at least, make it more similar to SPSS or Stata. Screenshots!
Veronique de Rugy, NRO contributor and George Mason fellow, says her research indicates that stimulus funding was disproportionately directed towards Democratic congressional districts. Nate Silver begs to disagree. De Rugy responds here; Silver responds here. Others say that this is a model "for the quick, effective peer-review that the internet facilitates." Perhaps this is a new model for peer review?
Significantly what?...Or how our most common statistical methods really weren't meant to be used that way and why that study result is likely spurious. Since mefites like to argue about stats, here's some background for us all (and I'm not talking correlation vs causation)!
DNA’s Dirty Little Secret: A forensic tool renowned for exonerating the innocent may actually be putting them in prison.
R is quickly becoming the programming language for data analysis and statistics. R (an implementation of S) is free, open-source, and has hundreds of packages available. You can use it on the command-line, through a GUI, or in your favorite text editor. Use it with Python, Perl, or Java. Sweave R code into LaTeX documents for reproducible research. [more inside]
Mercenary Epidemiology: Data Reanalysis and Reinterpretation for Sponsors With Financial Interest in the Outcome. (.pdf link) When should scientists be required to release their raw data for (potentially hostile) re-analysis? A letter to the editors of Annals of Epidemiology from David Michaels, Ph.D., MPH, public health blogger, author of the book Doubt Is Their Product, and, as of December 2009, the Assistant Secretary of Labor for OSHA, unanimously confirmed by the Senate despite the dismay of some. Michaels interviewed at Science Progress about Doubt Is Their Product (podcast, with transcript.)
Researcher uses data regarding connections on facebook to map distinct regions of the United States.
"We’ve processed the messaging habits of almost a million people and are about to basically prove that, despite what you might’ve heard from the Obama campaign and organic cereal commercials, racism is alive and well." The people who run the dating site OkCupid continue to analyze the aggregate data of their users, shedding light on preferences and behavior. The most recent OkTrends post takes a look at their compiled racial data: Your Race Affects Whether People Write You Back. (previously 1 2)
How (not) to write an online-dating message, based on a sample of 500,000 "first contact" messages. [more inside]
"Death Risk Rankings calculates your risk of dying in the next year and allows you to compare that risk to others in the world." Fun with mortality data and statistics from Carnegie Mellon University.
C0nc0rdance [sytl] asks; How far should we trust common sense? A less than 9 min video on Common Sense as it relates to Science. Enjoy.
What if we condensed the UK into a village of 100 people? The Independent experiment with demographics.
The fine folks at OkCupid, the dating site, have begun to analyze aggregate data from the questions their users answer to form dating profiles, revealing, among other things, that users in Nevada are more open to rape fanstasies than those from Michigan. [more inside]