Mining the Mother of all Data Dumps We now have a relatively massive haul of digital data from the OBL strike. There are several forensic toolkits in use by the private (commercially available) and public sector as well as open-source. Best practices include inventorying all the sources, cloning the sources so as to not damage pristine data, recovering any partial or damaged content, making the cloned sources read-only, adhering to legally-admissible tools standards, and documenting everything. There is an excellent source titled Digital Forensics and Born-Digital Content from the Council on Library and Information Resources [pdf, Resource Shelf]. But what to do next*? [more inside]
AOL releases 3-months of queries from 500k users. AOL, either fairly or unfairly, is sometimes considered the internet with training wheels. So while parsing this data, keep that in mind. Some of these queries seem like spam email subjects, don't they? Don't forget, this is the same demographic that brought you the September that didn't end. AOL tried to retract the data, but it's of no use - it's out there, on the web.
Google's Crystal Ball::NYTimes. Quite interesting...Via TechDirt:
Google has created a predictive market system, basically a way for its employees to bet on the likelihood of possible events. Such markets have long been used to predict world events, like election results. Intrade, part of the Trade Exchange Network, allows people to bet on elections, stock market indexes and even the weather, for example.I wonder how accurate the aggregated content of blogs would be to measure the likelihood of prospective real world events? The economist they consulted, Hal R. Varian, has some interesting links on his web page as well. I think that the internet better get their anti-spam technology up to par before we have people "gaming" the future through blogspam. For an explanation of Futures Markets (charts), see this page at the US Commodity Futures Trading Commission.