LOC amassing tweets at breakneck pace, needs help to make it accessible
January 29, 2013 10:53 PM Subscribe
posted by sundog (20 comments total)
10 users marked this as a favorite
The Library of Congress posted a Jan 2013 update
on its mission to archive public tweets, announced back in April 2010 (previously
). 170 billion tweets so far, adding more than .5 billion per day. Search for a term? Prepare to wait ~24 hours.
The pdf report
includes details of the agreement with Twitter such as, "The Library cannot provide a substantial portion of the collection on its web site in a form that can be easily downloaded." They have met 2 out of 3 mission goals for the Twitter archive: "to acquire, preserve and provide access to a universal collection of knowledge and the record of America’s creativity for Congress and the American people."
In the past 2+ years, the LOC and Gnip
still have not been able to implement a solution to allow researchers to search Twitter's gift
in a usable manner. Storing 133TB of compressed data on redundant tapes helps with archiving, but certainly contributes to search response times measured in hours (if not days).
Entrepreneurs, big data scientists: Contact LOC Director of Communications @GayleOsterberg
via (what else?) Twitter.