How to tell correlation from causation - "The basic intuition behind the method demonstrated by Prof. Joris Mooij of the University of Amsterdam and his co-authors is surprisingly simple: if one event influences another, then the random noise in the causing event will be reflected in the affected event."
"Facebook actually makes masks out of everyone’s faces." Artist Sterling Crispin creates DATA-MASKS as a way to physically present the abstract data structures that Facebook and biometric surveillance systems use to pull a face from a crowd.
A checklist for those making graphs from Stephanie Evergreen and Ann Emery. This is a useful tool for teaching scientists and others some of the rules of data presentation in graph form.
How Americans Die - a visual tour through surprising trends in mortality among Americans in the last several decades
Ben Goldacre, The Guardian: "Today we found out that Tamiflu doesn't work so well after all. Roche, the drug company behind it, withheld vital information on its clinical trials for half a decade, but the Cochrane Collaboration, a global not-for-profit organisation of 14,000 academics, finally obtained all the information. Putting the evidence together, it has found that Tamiflu has little or no impact on complications of flu infection, such as pneumonia." [more inside]
In a new exhibition titled Beautiful Science: Picturing Data, Inspiring Insight, the British Library pays homage to the important role data visualization plays in the scientific process. The exhibition can be visited from 20 February until 26 May 2014, and contains works ranging from John Snow's plotting of the 1854 London cholera infections on a map to colourful depictions of the Tree of Life. In a Nature Video, curator Johanna Kieniewicz explores some of the beautiful examples of visualizations that are exhibited.[more inside]
PLOS’ New Data Policy: Public Access to Data "PLOS has always required that authors make their data available to other academic researchers who wish to replicate, reanalyze, or build upon the findings published in our journals. In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings." Openscience.org also have a primer on why open science data is important.
"The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document". It can be installed faily easily with anaconda or on Amazon EC2. Various interesting notebooks are to be found at the official Notebook Viewer site Another collection of interesting notebooks on many topics. [more inside]
At the core of good science and engineering is the careful and respectful treatment of data. We calibrate our instruments, scrutinize the algorithms we use to process the data, and study the behavior of the models we use to interpret the data or simulate the phenomena we may be observing. Surprisingly, this careful treatment of data often breaks down when we visualize our data.
Get Data [SLYT]
The data analysis group that used Facebook and set top TV data to help Barack Obama win the latest election is taking its talents to the private sector. (SL NYTimes)
How The Economic Machine Works by Ray Dalio actually makes a case against austerity and for redistribution, but also for money printing (and, arguably, for bailouts), while stressing the need to keep making productivity-improving public and private investments. However, it could be equally entitled: How The Industrial Age Political-Economy Doesn't Work Anymore, viz. Surviving Progress (2011)... [more inside]
"We live in a world where digital information is exploding. Some 90% of the world’s data was generated in the past two years. The obvious question is: how can we store it all? In Nature Communications today, we, along with Richard Evans from CSIRO, show how we developed a new technique to enable the data capacity of a single DVD to increase from 4.7 gigabytes up to one petabyte (1,000 terabytes). This is equivalent of 10.6 years of compressed high-definition video or 50,000 full high-definition movies."
Is Psychometric g a Myth? - "As an online discussion about IQ or general intelligence grows longer, the probability of someone linking to statistician Cosma Shalizi's essay g, a Statistical Myth approaches 1. Usually the link is accompanied by an assertion to the effect that Shalizi offers a definitive refutation of the concept of general mental ability, or psychometric g." [more inside]
291 diseases and injuries + 67 risk factors + 1,160 non-fatal complications = 650 million estimates of how we age, sicken, and die
As humans live longer, what ails us isn't necessarily what kills us: five data visualizations of how we age, sicken, and die. Causes of death by age, sex, region, and year. Heat map of leading causes and risks by region. Changes in leading causes and risks between 1990 and 2010. Healthy years lost to disability vs. life expectancy in 1990 and 2010. Uncertainties of causes and risks. From the team for the massive Institute for Health Metrics and Evaluation Global Burden of Diseases, Injuries, and Risk Factors Study 2010. [more inside]
Have you ever wondered what the water temperature off the Kamchatka Peninsula is? What about the wind speed in the Andaman Sea? Or maybe you’re losing sleep over the chlorophyll levels in the South Pacific. Fortunately, all of that information –- and 450 million other data points collected from oceanographic instruments around the world –- is freely and easily accessible thanks to the Marinexplore project. [more inside]
Peter Turchin is a Professor of Mathematics, and of Ecology and Evolutionary Biology at the University of Connecticut. For the last nine years, he's been taking the mathematical techniques that once allowed him to track predator–prey cycles in forest ecosystems, and using them to model human history -- a pattern identification process he calls Cliodynamics. The goal of cliodynamics (or cliometrics) is to turn history into a predictive, analytic science. By analysing some of the broad social forces that shape transformative events in US society: historical records on economic activity, demographic trends and outbursts of violence, he has come to the conclusion that a new wave of internal strife is already on its way, and should peak around 2020. [more inside]
The First Photo on the Web: A story of crossdressing, particle physics, humorous science-based novelty songs, and terrible photoshop.
In The Geographic Flow of Music (arxiv), researchers Conrad Lee and Pádraig Cunningham propose a method to use data from the last.fm API to track the world's listening habits by location and time, showing where shifts in musical tastes have originated and subsequently migrated. Results show music trends originating in smaller cities and flowing outward in unexpected ways, contradicting some assumptions in social science about larger cities being more efficient engines of (cultural) invention.
A Mismeasured Mismeaurement of Man. Stephen Jay Gould's classic The Mismeasure of Man argues that 19th century scientist Samuel George Morton inflicted his own racial biases on his data to demonstrate that Caucasians had larger brains than other races. A new paper in the Public Library of Science: Biology debunks Gould's account by remeasuring the same skulls Morton used. Whatever biases Morton may have had, they are not reflected in the data.
“If you display information the right way, anybody can be an analyst,” Tufte once told me. “Anybody can be an investigator.” - The Washington Monthly interviews informaticist Edward Tufte [via]
MeFi's own Elizabeth Pisani, of The Wisdom of Whores, on Big Data and the End of the Scientific Method (PDF).
Mercenary Epidemiology: Data Reanalysis and Reinterpretation for Sponsors With Financial Interest in the Outcome. (.pdf link) When should scientists be required to release their raw data for (potentially hostile) re-analysis? A letter to the editors of Annals of Epidemiology from David Michaels, Ph.D., MPH, public health blogger, author of the book Doubt Is Their Product, and, as of December 2009, the Assistant Secretary of Labor for OSHA, unanimously confirmed by the Senate despite the dismay of some. Michaels interviewed at Science Progress about Doubt Is Their Product (podcast, with transcript.)
One of the great things about Google Earth is how extensible it is using KML. You can use it to show off placemarks, build 3D structures, track wildfires or hurricanes, and much more. Google Earth can be used as a scientific visualization platform. OpenEarth is an open source initiative that archives, hosts and disseminates Data, Models and Tools for marine and coastal scientists and engineers. Their KML data visualizations using Google Earth display some of the possibilities. [via] [more inside]
How to Talk to a Climate Sceptic: "...a handy one-stop shop for all the material you should need to rebut the more common anti-global warming science arguments constantly echoed across the internet."
Thoreau was into it. Scientists are using it to understand climate change. When Project Budburst starts again on Febraury 15th, you can participate, too. [more inside]