Giant database of current TV news transcripts and video
September 17, 2012 7:19 PM   Subscribe

You know how Jon Stewart shows politicians contradicting themselves on news clips? Do it yourself by searching a giant database of TV transcripts and video on Internet Archive. The collection now contains 350,000 news programs collected since 2009 from national U.S. networks and stations. The archive is updated with new broadcasts 24 hours after they are aired. Older materials are being added.

There's a lot more to the site than searching for text, add comments to video at certain points, statistical graphs, etc. see the "observations and uses" video and statements. Related is the Vanderbilt Television News Archive (subscription).

In other press:
*All the TV News Since 2009, on One Web Site (New York Times)
*Let's Go to the Videotape: Nonprofit Offers News Clips (subscription workaround: paste the article title into Google, click through from there).
posted by stbalbach (22 comments total) 66 users marked this as a favorite
 
Also of note - CSPAN archives
posted by XMLicious at 7:27 PM on September 17, 2012 [1 favorite]


Cow tipping!
posted by zippy at 7:29 PM on September 17, 2012


The WSJ article is the most investigative suggesting Internet Archive may have over-stepped Copyright and most news outlets have not given their explicit permission, but IA appears to be claiming Fair Use since only 30 seconds of video are showed at a time, sort of a Google Books move.
posted by stbalbach at 7:33 PM on September 17, 2012


It just got a lot easier to make the next Rocked by Rape or Dispepsi.
posted by filthy light thief at 7:40 PM on September 17, 2012


Pancakes!
posted by blue_beetle at 7:43 PM on September 17, 2012


In addition to fair use, the Archive can rely on 17 U.S.C. § 108 (f)(3), which grants certain additional protections for this purpose:
Nothing in this section - ... shall be construed to limit the reproduction and distribution by lending of a limited number of copies and excerpts by a library or archives of an audiovisual news program, subject to clauses (1), (2), and (3) of subsection (a); or
Certainly, broadcasters might make the argument that what archive.org is doing goes beyond "a limited number of copies and excerpts," but the Internet Archive has a decent position here. Their policy appears to be to provide 30-second clips and caption search online, and to lend DVDs of individual programs upon request and payment of a fee. I suspect a court would be more inclined to see that more similarly to a traditional library service (as Vanderbilt has been doing for years) than as lawless filesharing hooligan pirate site.

Also telling is the quote they have lined up from Newton Minow, former FCC chair (and Obama's former boss): "The Internet Archive's TV news research service builds upon broadcasters' public interest obligations. This new service offers citizens exceptional opportunities to assess political campaigns and issues, and to hold powerful public institutions accountable." The implicit threat there, as I read it, is that broadcasters should think twice before picking a fight over this one, lest someone start asking them how they've served the public today.
posted by zachlipton at 7:52 PM on September 17, 2012 [7 favorites]


Also of note - CSPAN archives

Use with caution.
posted by AElfwine Evenstar at 7:53 PM on September 17, 2012 [1 favorite]


For a brief moment I misread that as CPAN archives. In either case, as AElfwine Evenstar says, use with caution.

But yes, this is epic. Oh the uses this could be put to could be amazing. Let us all gather in unholy confluence and decide what to do.
posted by JHarris at 8:11 PM on September 17, 2012 [1 favorite]


The act of copying all this news material is protected under a federal copyright agreement signed in 1976. That was in reaction to a challenge to a news assembly project started by Vanderbilt University in 1968.

Congress carving out a new limitation on copyright for the benefit of researchers and librarians seems unimaginable today.
posted by grouse at 8:26 PM on September 17, 2012 [9 favorites]


Three words, seen in most threads
posted by zippy at 10:01 PM on September 17, 2012


"When the facts change, I change my mind."

A pity these databases make it difficult for politicians to do that.
posted by ocschwar at 10:15 PM on September 17, 2012 [2 favorites]


Congress carving out a new limitation on copyright for the benefit of researchers and librarians seems unimaginable today

I may be wrong, but I think the Internet Archive, in a library role, lobbied Congress to get an exemption in the DMCA (Digital Millenium Copyright Act) to allow reverse engineering of copy-protected software.
posted by zippy at 10:43 PM on September 17, 2012


Ah yes, here it is.
posted by zippy at 10:47 PM on September 17, 2012


The most colorful language seems to be on CSPAN Book TV.
posted by RobotVoodooPower at 10:53 PM on September 17, 2012


The DMCA isn't copyright, zippy— it outlaws a whole bunch of things that are not copyright infringement, not just things which are fair use but things which are not copyright use in the first place. (Which is why places like libraries need to lobby for exemptions.)
posted by hattifattener at 12:00 AM on September 18, 2012 [2 favorites]


I was pleasantly surprised to learn that Newt Minow was still alive and opining. In 1961, he famously called (commercial) television a "vast wasteland" as he advocated for programming in the public interest. The S.S. Minnow in Gilligan's Island was purportedly named after him.

Without knowing all the details, The Internet Archive is likely on solid legal/political ground here. Brewster Kahle is a smart guy with lots of friends and resources, and if broadcasters give him a hard time legally, they might ultimately find themselves in a worse position than if they just let this go.
posted by Hello Dad, I'm in Jail at 1:04 AM on September 18, 2012 [1 favorite]


And with that, I decided to send a donation to the Internet Archive.

I think they might be the most important innovation in libraries in a long time.
posted by grudgebgon at 5:18 AM on September 18, 2012 [5 favorites]


The DMCA is mostly about copyright, you're right, I now see it also goes beyond copyright to grant additional protections. I mentioned the DMCA because I misunderstood this, but also because any time people and libraries win vs media producers' lobbyists, I think that's a) really damn close, and b) pretty awesome.

That said, from the ever-authoritative Wikipedia:

"In addition to the safe harbors and exemptions the statute explicitly provides, 17 U.S.C. 1201(a)(1) requires that the Librarian of Congress issue exemptions from the prohibition against circumvention of access-control technology. Exemptions are granted when it is shown that access-control technology has had a substantial adverse effect on the ability of people to make non-infringing uses of copyrighted works."

Here they're talking about DRM and copy-protection, and an exemption from penalties attached to breaking these. While that may not be, in a precise law-talking sense, a copyright issue, as a non-lawyer I'd call it close enough.
posted by zippy at 10:55 AM on September 18, 2012


I may be wrong, but I think the Internet Archive, in a library role, lobbied Congress to get an exemption in the DMCA (Digital Millenium Copyright Act) to allow reverse engineering of copy-protected software.

This seems a bit like you are well-actualling here. In any case, I see a big difference between Congress creating a new specific exception to existing copyright as they did in 1968, and Congress dramatically expanding the reach of copyright laws with the DMCA but setting possible limitations on that expansion (but only when the Copyright Office later agrees to it).
posted by grouse at 11:19 AM on September 18, 2012


One factoid not really making it into the press reports: The TV News archive goes back to 2009, but in fact there are recordings going back to the end of 2000. It takes significant machine processing to ingest the items into the form the archive uses, so it's a slow backfill, but it's happening - so expect to be able to browse news going back to the start of the W presidency, as well as everything in between.

I work for a great place.
posted by jscott at 11:40 AM on September 18, 2012 [6 favorites]


Grouse, I added that info on the archive with the best of intentions. Occasionally, the good guys win rights back for us.
posted by zippy at 4:03 PM on September 18, 2012


You know how Jon Stewart shows politicians contradicting themselves on news clips?

Actual Democalypse 2012 - Conservatives Rethink Middle Eastern Democracy
posted by homunculus at 7:11 PM on September 18, 2012


« Older Chess: A Musical   |   things you didn't know you needed Newer »


This thread has been archived and is closed to new comments