With kettling becoming a commonly deployed tactic by the London Met, students from the University College London are fighting back with Sukey, launched this morning. [more inside]
Dataists give their hopes and dreams for data, data tools and data science in 2011. Already, Google has provided Google Refine (previously) to help clean your datasets. While great visualizations can be created with online tools or by combining R (great posts previously), with ggplot2, GGobi, and even Google Motion Charts With R (already built into Google Spreadsheets). Need data? Needlebase, helps non-programmers scrape, harvest, merge, and data from the web. Or if you’re introspective, Your Flowing Data and Daytum provide tools to measure and chart details of your own life.
"He might have read the document when he was tired, at the end of a long day of being tied to a whale."
"They're not out to make a quick buck, they're looking to protect the integrity of the franchise and its mythology." 1998's Star Trek Insurrection went through a number of different plots before becoming the film we ultimately saw. Starting out as Star Trek: Stardust, the first take on the idea involved Captain Picard going all Heart of Darkness on a former friend from his Starfleet Academy days in a bid to find the Fountain of Youth. That treatment evolved into a remarkably Avatarish story called simply Star Trek IX in which Picard must go upriver to kill a malfunctioning Data as part of a Federation/Romulan alliance to displace strange alien natives from a planet teeming with a valuable and rare ore (spoiler: Picard actually kills Data in this treatment, and Tom Hanks was supposed to have a major role somewhere). Let the late Michael Piller guide you through the writing of Insurrection in his unpublished book Fade In: The Making of Star Trek: Insurrection (his "last great gift to the fans and to aspiring writers everywhere") in which he presents his original story treatments, story notes from his bosses at Paramount, surprisingly reasonable Trekker-type reactions from actors Patrick Stewart and Brent Spiner, and much more. First made freely available by TrekCore.com, Piller's family has since asked that it be removed, but you'll still find the file roaming the Internet if you boldly go looking for it. [more inside]
MeFi's own Elizabeth Pisani, of The Wisdom of Whores, on Big Data and the End of the Scientific Method (PDF).
The New York Times presents an interactive map of America's population separated by race, income, and education, according to census data from 2005 to 2009. One dot for every 50 people. (Previously) [more inside]
Elizabeth Warren on setting up the Bureau of Consumer Financial Protection - lecture starts here, but really starts getting good here: "I feel like this is a boring speech." stay for the Q&A.
Kaggle hosts competitions to glean information from massive data sets, a la the Netflix Prize. Competitors can enter free, while companies with vast stores of impenetrable data pay Kaggle to outsource their difficulties to the world population of freelance data-miners. Kaggle contestants have already developed dozens of chess rating systems which outperform the Elo rating currently in use, and identified genetic markers in HIV associated with a rise in viral load. Right now, you can compete to forecast tourism statistics or predict unknown edges in a social network. Teachers who want to pit their students against each other can host a Kaggle contest free of charge.
20.10.2010 is World Statistics Day, so help yourself to a metric (haha sorry) ton of publicly available data at UNdata, ICSPR (registration required to download data sets), and data.gov (previously). You can also explore, visualize and animate a variety of publicly available data sets with Google Labs' Public Data Explorer.
25 most dangerous neighborhoods 2010. Click through the maps for some more specific data.
visualizing.org, Making sense of complex issues through data and design. About. Visualizing is a place to showcase your work, get feedback, ensure that your work is seen by lots of people and gets used by teachers, journalists, and conference organizers to help educate the public about various world issues.
A Tour through the Visualization Zoo. A survey of powerful visualization techniques, from the obvious to the obscure.
The Spatial History Project at Stanford University creates striking visualizations of historical data, including an 1850 yellow fever epidemic in Rio de Janeiro, and prostitution arrests in Philadelphia in the teens.
The Tornado History Project: Google Maps meets historical data Tornado data turned into Google Maps that you can slice and dice any way you want: By State, by Date range, by Fujita number. Even records the path of long-track tornadoes. Hours of fun for weather weenies (like me!) and those interested in investigating trends over time. [more inside]
John Billes—whose extracurricular exploits as an undergraduate at UT Austin brought us iPhone-controlled dance floor lights, R/C cars, and yes, even full-size automobiles—has created the KegMate—a keg-mounted, Arduino-controlled data-logging suite with an iPad-based user interface—in his spare time, while working at Yelp.
OK Cupid statistics fun: We collected 552,000 example user pictures. We paired them up and asked people to make snap judgments. Here's what we found.
Our results open a fascinating new direction for position-based security in cryptography where security of protocols is solely based on the laws of physics and proofs of security do not require any pre-existing infrastructure.
If you're a Leftie you like Ellison and Herbert. If you're a Rightie you like Anderson and Heinlein.
New Maps of Science Fiction
The first question that naturally comes to mind about stories and authors is "How much do you like them?" Literary critics try to go far beyond this simple query, but it is the one that people ordinarily care most about, and for us it is the most important sociological question. Using modern techniques of analysis we can recover a tremendous amount of hidden information from statistics of people's likes and dislikes.Analog Yearbook, 1977, pages 277-299. (via)
Hans Rosling, who helped usher in TED talks way back when using stunning visuals, envisions how the world will look in 50 years as global population grows to 9 billion. To check further population growth, which might have disastrous consequences, he exhorts us to raise the living standards of the poorest. [more inside]
CNN.com's 'Home and Away' initiative honors the lives of U.S. and coalition troops who have died in Iraq and Afghanistan. The extensive data visualization project tells the story of where and how the lives of these troops began and ended. The project is a sobering look at the human cost of two wars in the Middle East, and as such is restrained with a sober palette of blacks, whites and greys. [via] [more inside]
The UK Government has published extracts from COINS, the Combined Online Information System used by the Treasury to track all public spending by the Government. Together, the files constitute about 11Gb of data in delimited text format containing consolidated financial information for each department and account type. [more inside]
AT&T Just Killed Unlimited Wireless Data (and Screwed Everybody in the Process) AT&T is likely just the first, since carriers rarely do anything alone (like when everybody launched unlimited voice calling in lockstep), and Verizon's CTO has rumbled that plans with "as much data as you can consume is the big issue that has to change." And so it is.
"The Journalist as Programmer" is an academic, ethnographic case study (pdf), which considers whether the New York Times' Interactive Newsroom Technologies unit, source of the paper's Open Source Developer Network, should be thought of as a template for the future of Web Journalism. Slide Deck. (Previously on MeFi.) NYMag profile of the INT team from '09: The New Journalism: Goosing the Gray Lady. ("What are these renegade cybergeeks doing at the New York Times? Maybe saving it.")
Anti-Identity-Theft Firm Lifelock was fined $12 Million in March for deceptive business practices by the FTC. More bad news: their CEO had his identity stolen 13 times after posting his own social security number in company ads as proof they could protect him. [more inside]
The Data-Driven Life. "Ubiquitous self-tracking is a dream of engineers. For all their expertise at figuring out how things work, technical people are often painfully aware how much of human behavior is a mystery. People do things for unfathomable reasons. They are opaque even to themselves. A hundred years ago, a bold researcher fascinated by the riddle of human personality might have grabbed onto new psychoanalytic concepts like repression and the unconscious. These ideas were invented by people who loved language. Even as therapeutic concepts of the self spread widely in simplified, easily accessible form, they retained something of the prolix, literary humanism of their inventors. From the languor of the analyst’s couch to the chatty inquisitiveness of a self-help questionnaire, the dominant forms of self-exploration assume that the road to knowledge lies through words. Trackers are exploring an alternate route. Instead of interrogating their inner worlds through talking and writing, they are using numbers. They are constructing a quantified self."
It's been estimated that the average UK adult is now registered on more than 700 databases and is caught many times each day by nearly five million CCTV cameras. So how hard would it be for an average citizen to disappear completely? That’s the subject of a new documentary film: Erasing David, (Trailer: YouTube, Vimeo) which premieres this evening in the UK on More4. It's also now available worldwide online at the iTunes store and through several Video On Demand services, as well as through Good Screenings. [more inside]
Yahoo is releasing a new service: Firehose, a real-time, searchable index of social content aggregated from around the web. Accessible via YQL, Yahoo’s SQL-like query language, the Firehose will gather data from status updates, user ratings and reviews, comment threads, Google Buzz, Flickr, Delicious, Twitter, YouTube, Last.fm and a range of other sites and apps. [via] [more inside]
The Body Snatchers look at a human and see a nice new home. The Visitors look at a human and see a yummy snack. The Smarter Planet people look at a human and see data. Our planet is alive with data. Yummy data.
Jeff Heard, from the Renaissance Computing Institute (a joint project between the University of North Carolina, Duke University, and North Carolina State University, among others), posts gorgeous visualizations of internet traffic to projects hosted by iBiblio.org. [more inside]
What do the Olympic finishing times sound like? It's sometimes hard to grasp the significance of the times or how close it was just by the numbers or even the photo finishes. [more inside]
According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Data data everywhere and possibly too much to drink?
R is quickly becoming the programming language for data analysis and statistics. R (an implementation of S) is free, open-source, and has hundreds of packages available. You can use it on the command-line, through a GUI, or in your favorite text editor. Use it with Python, Perl, or Java. Sweave R code into LaTeX documents for reproducible research. [more inside]
Mercenary Epidemiology: Data Reanalysis and Reinterpretation for Sponsors With Financial Interest in the Outcome. (.pdf link) When should scientists be required to release their raw data for (potentially hostile) re-analysis? A letter to the editors of Annals of Epidemiology from David Michaels, Ph.D., MPH, public health blogger, author of the book Doubt Is Their Product, and, as of December 2009, the Assistant Secretary of Labor for OSHA, unanimously confirmed by the Senate despite the dismay of some. Michaels interviewed at Science Progress about Doubt Is Their Product (podcast, with transcript.)
Researcher uses data regarding connections on facebook to map distinct regions of the United States.
Governments around the globe are opening up their data vaults allowing us to check out the numbers for ourselves. This is the Guardian’s gateway to that information. Search for government data here from the UK, USA, Australia and New Zealand — and look out for new countries and places as they are added. Read more about this on the Datablog. [more inside]
One of the great things about Google Earth is how extensible it is using KML. You can use it to show off placemarks, build 3D structures, track wildfires or hurricanes, and much more. Google Earth can be used as a scientific visualization platform. OpenEarth is an open source initiative that archives, hosts and disseminates Data, Models and Tools for marine and coastal scientists and engineers. Their KML data visualizations using Google Earth display some of the possibilities. [via] [more inside]
Peacay of BibliOdyessey highlights some stunning examples of Victorian Infographics from the Rumsey Map Collection(previously). (Direct Flickr link)
The Confessions of an NBA Scorekeeper Gawker's Tommy Craggs talks with an ex-scorekeeper for the Vancouver Grizzlies, and reveals the subjectivity of stat keeping in the NBA. This guy once gave Nick Van Exel 23 assists just because he felt like it.
How to Talk to a Climate Sceptic: "...a handy one-stop shop for all the material you should need to rebut the more common anti-global warming science arguments constantly echoed across the internet."
Beautiful data visualisations of the original Choose Your Own Adventure stories. A project by Christian Swinehart.
This morning, Google launched a new feature called "Google Dashboard" that lets users view (and in some cases control,) what data is being stored on a range of more than 20 Google services, including Gmail, Calendar, Docs, Web History, Orkut, YouTube, Picasa, Talk, Reader, Alerts and Latitude. [more inside]