corpora is a Github repository containing machine-readable lists of interesting words and phrases that "are potentially useful in the creation of weird internet stuff." The corpora range from the mundane (common English words, animals, corporations, pizza toppings) to the obscure (types of knot, wrestling moves, Lovecraftian deities) to the absurd (states of drunkenness, deceased Spinal Tap drummers, unrhymable words).
"The internationalized art world relies on a unique language. Its purest articulation is found in the digital press release. This language has everything to do with English, but it is emphatically not English. It is largely an export of the Anglophone world and can thank the global dominance of English for its current reach. But what really matters for this language—what ultimately makes it a language—is the pointed distance from English that it has always cultivated. " - Triple Canopy magazine on why do artists' statments and press releases sound so utterly odd and confusing.
The Corpus Inscriptionum Latinarum is a massive, 17-volume catalog of 180,000 inscriptions and graffiti found across the Roman Empire in classical times. It's available for free online now, starting with the parts published before 1940. I'm fond of volume 4, which covers Pompeii and Herculaneum. (Pompeii graffiti prev) [more inside]
Online Corpora. In linguistics, a corpus is a collection of 'real world' writing and speech designed to facilitate research into language. These 6 searchable corpora together contain more than a billion words. The Corpus of Historical American English allows you to track changes in word use from 1810 to present; the Corpus del Español goes back to the 1200s.
Google's new Ngram Viewer lets you track the history of words in six languages, including several flavo(u)rs of English. Whether it's the rise and fall of a single word, the evolution of technology, or the mysterious seventeenth-century proliferation of fart jokes, there's a lot to play with. More at the Official Google Blog.
Anglo-Saxon Aloud: Daily readings (and podcasts) from the Complete Corpus of Anglo Saxon Poetry, presented by Prof. Michael Drout, Wheaton College. For those that like to read along, the Corpus presented in text (no translation, though).
You may have seen Newt Gingrich this past Tuesday on The Daily Show describing Obama's decision to try the Underpants Bomber in the courts as "radical." He pointed out an incident in 1942 when Franklin Roosevelt suspended habeus corpus for Nazi saboteurs dropped off on Long Island by submarine to wreak havoc on Ameica. While "Nazi Terrorists" might be almost comic book class villains, Newt probably would prefer people not to recall the true story and villains of Operation Pastorius.
The Michigan Corpus of Academic Spoken English is a searchable collection of almost 2 million words of transcribed spoken English from the University of Michigan, including student study groups, office hours, dissertation defenses, and campus tours. Researchers use the Michigan corpus to investigate questions about usage, like "less or fewer?" (cf. this contentious Ask Meta thread) and more general topics, like "Vague Language in Academia." Browse or search MICASE yourself.
Performance artists Corpus do Sheep. They also do Le Grand Peep Show. (links safe for peeps, safe for sheep and safe for work. Via Nutritional Plastic)