"Perhaps the most important news of our day is that datasets — not algorithms — might be the key limiting factor to development of human-level artificial intelligence". Alexander Wissner-Gross responding to Edge. Found here, with some links and a table.
GDELT data is now publicly available. GDELT stands for Global Data on Events, Location and Tone, and is a dataset that contains information on over 200 million geolocated events. [more inside]
A Study of 10,000 Porn Stars and Their Careers: For the first time, a massive data set of 10,000 porn stars has been extracted from the world’s largest database of adult films and performers. I’ve spent the last six months analyzing it to discover the truth about what the average performer looks like, what they do on film, and how their role has evolved over the last forty years.
Data.gov.au is a site giving public access to datasets from the Australian federal, state and territory governments. It was created in response to the Declaration of Open Government, which aims to get more citizen collaboration in policy and service delivery design. People are encouraged to use these datasets to produce apps or conduct research. So far the little-publicised site has resulted in apps such as Dunny Directory, Convict Records of Australia and Transhub, a public transport planner for the nation’s capital. If you’re interested in more online government participation in Australia, Craig Thomler is tracking developments on his eGov AU blog.
♫ "The first and only freely-available, industrial-scale dataset for research on popular music and audio analysis" ♬
"The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks." It's about 288 GB but you can download a smaller subset of 10,000 songs selected at random to get a taste. Curious what you'll get? Check out this example track description. [more inside]
This information was supposed to be private, wasn't it?