Datamining Shakespeare --- Othello is a Shakespearean tragedy: when the hero makes a terrible mistake of judgment, his once promising world is led into ruin. Computer analysis of the play, however, suggests that the play is a comedy or, at least, that it does the same things with words that comedies usually do. On October 26, 2011, Folger Shakespeare Library Director Michael Witmore discussed his recent work in Shakespeare studies which combines computer analysis of texts, linguistics, and traditional literary history. Taking the case of Shakespeare's genres as a starting point, Witmore shows how subtle human judgments about the kinds of plays Shakespeare wrote — were they comedies, histories or tragedies? — are connected to frequent, widely distributed features in the playwright's syntax, vocabulary, and diction. (approx. 30 minute lecture.) [more inside]
On not reading books. Franco Moretti, author of the controversial Graphs, Maps, Trees: Abstract Models for a Literary History, proposes that literary study needs to abandon "close reading" for "distant reading": "understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data." He is co-founder of the Stanford Literary Lab, where he and like-minded colleagues have published studies on programming computers to use statistical analysis to identify a novel's genre(PDF) and analyzing plots as networks(PDF). Similar projects are on the way.
It has applications in health care, pharmaceuticals, facial recognition, economics/related areas, and of course, much much more. Previously, MeFi discussed controversial homeland security applications, and the nexus between social networking and mobile devices that further contributes to the pool. With plenty to dig into, let's talk Data Mining in more detail. [more inside]
Exploring enron -- A breathtaking web of conspiratorial email messages. How often did Jeff Skilling email Ken Lay? How often were those emails about company business? Internal alliances? The company's allegiance? The California energy crisis? Who else was talking about it? Who wasn't? Temptingly complete with software download and MySQL tables for your own tinfoil hat explorations.