The question is not so much “do you trust the CIA/NSA/MI6/etc?”. It’s “Do you trust every single sysadmin working for these organisations? Every single analyst? Every single middle manager?”
For the very first time, the sales of one million sex toys and 45,000 of their reviews have been analysed to reveal what we do in our most intimate and uninhibited moments. Research by Jon Millward, who also brought us Deep Inside. (Previously) [All links NSFW]
On April 8, 2013, I received an envelope in the mail from a nonexistent return address in Toledo, Ohio. Inside was a blank thank-you note and an Ohio state driver’s license. The ID belonged to a 28-year-old man called Aaron Brown—6 feet tall and 160 pounds with a round face, scruffy brown hair, a thin beard, and green eyes. His most defining feature, however, was that he didn’t exist. I know that because I created him.
"...it's social media that has helped build the public case against Russia" in Ukraine. One example is liveuamap.com, who "gather information from open sources and put it on the [Google] map" using familiar Google Maps markers for a Reds (Pro-Russian) vs Blues (Pro-Ukraine) theater map. Shaded regions indicate the Donetsk People's Republic (DNR; Red), Lugansk People's Republic (LNR; Purple), the MH17 crash site (Yellow), and the MH17 ceasefire zone (green). The posts linked to by each marker include a link to the source via a chain icon at the bottom of the post.
Only 15 million, riiiight. A data experiment out of Florida State University maps the location of 1 million of the 15 million publicly available online images tagged with the word "cat." Using a supercomputer and the map coordinates imbedded in their metadata, I Know Where Your Cat Lives shows where each image was taken, to within an estimated 7.8 meters accuracy. [more inside]
Datashine: Census is a site from UCLs Big Open Data: Mining and Synthesis project which provides an easy interface to map UK population data. [more inside]
Big Data Pictures is a tumblr for visualizations of big data.
"These big collections of personal data are like radioactive waste. It's easy to generate, easy to store in the short term, incredibly toxic, and almost impossible to dispose of. Just when you think you've buried it forever, it comes leaching out somewhere unexpected." A talk by Maciej Ceglowski, founder of Pinboard, about why we have Big Data and why it's frightening. [more inside]
Strava, the bike and run tracking system, is using their database to create Strava Metro, to sell to urban planners for commute data. But unless you're the Oregon DoT, London, or Alpine Shire, you might find the Strava bike and run heatmaps more useful. [more inside]
"And finally, I'm actually here today to win the 'Most Creative Use of Tor' award," she said, followed by roars of laughter in the audience. "I really couldn't have done it without Tor, because Tor was really the only way to manage totally untraceable browsing. I know it's gotten a bad reputation for Bitcoin trading and buying drugs online, but I used it for BabyCenter.com."-- How Janet Vertesi tried and hid her pregnancy from the internet and big data. (Direct link to her presentation.)
Controversial education tech company InBloom has shut down over student data privacy concerns. Backed with $100 million in grants from the Bill & Melinda Gates Foundation and the Carnegie Corporation of New York, InBloom quickly announced nine states (CO, DE, GA, IL, KY, LA, MA, NC, NY) as partners, with more than 2.7 million students enrolled, with the goal of using big data to direct education emphasis and other decisions. With a recent decision by New York state to halt participation in any project involving storing student data in the way InBloom had planned (and the deletion of any such data already stored), all nine states had either put data sharing plans with InBloom on hold, made them voluntary, or pulled out completely. [more inside]
"The first journalist to attempt reporting on the Wikileaks cables was David Leigh of The Guardian. The material arrived as a single 1.7GB CSV file containing 251,287 U.S. diplomatic cables from 1966 to 2010. If you’ve ever tried to open a 1.7GB file, you know you probably can’t. Microsoft Word and Excel will plain refuse. Windows Notepad and Mac TextEdit will try, but slow to a crawl." At Opennews Source, Jonathan Stray has written a helpful beginners' guide to dealing with large amounts of documents for journalists and interested lay people.
Databall. With an ocean of new statistical information available, the NBA could be on the verge of understanding the value of every single movement on the court.
Sexualitics tries to contribute to human sexuality understanding through a Big Data approach. Studies (PDF), Datasets and Porngrams (maps the evolution of words frequencies in the titles of porn videos).
Measuring societal zeitgeist by counting mood words across millions of books correlates with the economic misery index shifted forward a decade. "When are we most miserable, according to literature? Ten to eleven years after an economic downturn." Paper: Books Average Previous Decade of Economic Misery.
The data analysis group that used Facebook and set top TV data to help Barack Obama win the latest election is taking its talents to the private sector. (SL NYTimes)
47% of US jobs under threat from computerization according to Oxford study. The study reveals a trend of computers taking over many cognitive tasks thanks to the availability of big data. [more inside]
A map of every protest everywhere since 1979 (some caveats are noted in the accompanying article).
An Aura of Familiarity: Visions from the Coming Age of Networked Matter. The Institute for the Future commissioned six science fiction writers to create short stories for their Age of Networked Matter research project. "We asked our collaborators to envision a world where humans have unprecedented control of matter at all scales, and to share with us a glimpse of daily life in that world. It was a process meant to make the future tangible." Three of the stories have appeared so far. [more inside]
"We live in a world where digital information is exploding. Some 90% of the world’s data was generated in the past two years. The obvious question is: how can we store it all? In Nature Communications today, we, along with Richard Evans from CSIRO, show how we developed a new technique to enable the data capacity of a single DVD to increase from 4.7 gigabytes up to one petabyte (1,000 terabytes). This is equivalent of 10.6 years of compressed high-definition video or 50,000 full high-definition movies."
Digital mapping startup MapBox teams up with social data warehouse Gnip to create some stunning visualizations of every geotagged tweet since September 2011. [more inside]
Singer/songwriter Vienna Teng has released a demo for The Hymn of Acxiom, a haunting, vocoded choral composition written from the perspective of a marketing database. [more inside]
"Maybe it's a sore point: your field should have an answer (people think you do) but there isn't one yet. Perhaps it's simple to pose but hard to answer. Or it's a question that belies a deep misunderstanding: the best answer is to question the question."
Why the collision of big data and privacy will require a new realpolitik:
The paper, entitled Unique in the Crowd: The privacy bounds of human mobility, took an anonymized dataset from an unidentified mobile operator containing call information for around 1.5 million users over 14 months. The purpose of the study was to figure out how many data points — based on time and location — were needed to identify individual users. The answer, for 95 percent of the “anonymous” users logged in that database, was just four.
Does Big Data Mean The Demise Of The Expert - And Intuition? - "Data-driven decisions are poised to augment or overrule human judgment." What Is Big Data? [more inside]
"The discovery advances UC Berkeley’s mission to make sense of big data and to use new technology to document and maintain endangered languages as critical resources for preserving cultures and knowledge. [...] it can also provide clues to how languages might change years from now."
291 diseases and injuries + 67 risk factors + 1,160 non-fatal complications = 650 million estimates of how we age, sicken, and die
As humans live longer, what ails us isn't necessarily what kills us: five data visualizations of how we age, sicken, and die. Causes of death by age, sex, region, and year. Heat map of leading causes and risks by region. Changes in leading causes and risks between 1990 and 2010. Healthy years lost to disability vs. life expectancy in 1990 and 2010. Uncertainties of causes and risks. From the team for the massive Institute for Health Metrics and Evaluation Global Burden of Diseases, Injuries, and Risk Factors Study 2010. [more inside]
The European Commission is resisting pressure from US firms and public bodies designed to derail its privacy proposals, which include a limited 'right to be forgotten' that would allow users to demand their data be removed from Internet sites. Facebook claims it would actually harm privacy by requiring social media sites to perform extra tracking to remove data which has been copied to other sites. Google says it's unworkable. Others say it would be a threat to the American right to free speech. Big Data hates the idea because privacy is bad. Meanwhile, advertising may soon follow you from one device to the next -- privately. (Via) [more inside]