Most books published 1923-63 in public domain
June 25, 2008 8:23 AM   Subscribe

"For U.S. books published between 1923 and 1963, the rights holder needed to submit a form to the U.S. Copyright Office renewing the copyright 28 years after publication. In most cases, books that were never renewed are now in the public domain. Estimates of how many books were renewed vary, but everyone agrees that most books weren't renewed. If true, that means that the majority of U.S. books published between 1923 and 1963 are freely usable." How do you know? The renewal copyright records have traditionally been scattered and hard to access, but Google - with the help of Project Gutenberg and the Distributed Proofreaders painstakingly typed in every word - has just released a single database as a freely downloadable XML file.
posted by stbalbach (53 comments total) 36 users marked this as a favorite
 
That's amazing and wonderful news. Thanks, stbalbach.
posted by ardgedee at 8:28 AM on June 25, 2008 [1 favorite]


56 megs.
posted by cashman at 8:31 AM on June 25, 2008


56 megs

of unbelievable awesomeness.

Thanks, stbalbach.
posted by cog_nate at 8:33 AM on June 25, 2008 [1 favorite]


oh nice! very good to know.
posted by fuzzypantalones at 8:35 AM on June 25, 2008


...and, wait, the announcement was written by Jon Orwant? Google really is getting all the alpha geeks, aren't they?
posted by ardgedee at 8:36 AM on June 25, 2008


This is fantastic on several levels. Hooray for the awesome.

I'm rather amused that I can't just search through the file on Google's site, though.
posted by phooky at 8:40 AM on June 25, 2008


Thanks for this. I hope this leads to more free ebooks!
posted by Monochrome at 8:42 AM on June 25, 2008


the README:
VERSION: 1.01 (June 24, 2008)

This is a set of the U.S. copyright renewal records for books.

The United States Copyright Office maintains the authoritative list of copyright registrations and renewals. The recording of those registrations is called the Catalog of Copyright Entries, and the renewal records are part of the catalog. Up until 1977, they are hardbound volumes, and very few copies exist. From 1978 onward, they are all online at http://www.copyright.gov/records/.

The tireless members of the Distributed Proofreaders and Project Gutenberg corrected the OCR from the pre-1978 hardbound volumes, with page images supplied by the Universal Library Project at Carnegie Mellon. They are available in their original form at http://www.gutenberg.org.

The 1978-onward data has been downloaded from the U.S. Copyright Office. These two collections of records -- the pre-1978 and the 1978-onward -- are available as a single XML file: google-renewals-all-20080516.xml. We believe we have compiled the only complete set of monograph renewal records outside of the U.S. Copyright Office. This is not a perfect set of renewal records and may contain inaccuracies.

The attached letter (letter_from_marybeth_peters.pdf) from Register of Copyrights Marybeth Peters declares the records to be in the public domain.
posted by stbalbach at 8:45 AM on June 25, 2008


Outstanding.
posted by penduluum at 8:46 AM on June 25, 2008


Holy goddam. It's xmas in June.
posted by cortex at 8:46 AM on June 25, 2008


Take *that*, non digital archives of things!
posted by Jofus at 8:49 AM on June 25, 2008 [3 favorites]


So now I can legally read some mashups?
posted by anthill at 8:50 AM on June 25, 2008


56 MB of just book titles has got to contain some gold.
posted by DU at 8:51 AM on June 25, 2008


Yay for the commons!!

The perpetual extension of copyright makes it extremely difficult for there to be any more Disneys, people or companies who take ideas from the global pool and remix them into something with a more modern appeal. This is not, I think, accidental; the existing media companies are trying very hard to own everything, forever.

This is very, very good news, a treasure trove of material that can now be reused and remixed freely. You may not see the benefits for awhile, and may never realize it when it happens, but this list will probably make your life a little better.

Heck, for the instant-gratification crowd, the potential pool of free e-books just grew a whole bunch, so maybe you will notice it. :)
posted by Malor at 8:52 AM on June 25, 2008 [1 favorite]


Revealing my own ignorance of the subject here: is this good news from the standpoint of creating online copies of all these books without the hassle of copyright permission? Is that the sum total of the goodness or are there more nuances that I'm missing?
posted by Osrinith at 8:52 AM on June 25, 2008


Ahhhh... Thanks, Malor.
posted by Osrinith at 8:53 AM on June 25, 2008


Looks to be about 1/2 a million titles, a small fraction of the number of books published during that period, orders of magnitude.

$ grep "Title" google-renewals-all-20080624.xml | wc
436214
posted by stbalbach at 8:53 AM on June 25, 2008


(actually, to be more specific, this is a list of what ISN'T available.... so if you have an old book, and it's not on the list, it's most likely really and truly yours, in all senses of the word. You can do anything you want with it.)
posted by Malor at 8:54 AM on June 25, 2008


Heck, for the instant-gratification crowd, the potential pool of free e-books just grew a whole bunch, so maybe you will notice it. :)

Indeed, and if you think Google's only aim was to produce this list, you've got another (extremely amazing) thing coming.
posted by cog_nate at 8:55 AM on June 25, 2008


you've got another...thing coming.

banhammer
posted by DU at 8:57 AM on June 25, 2008


banhammer

?
posted by cog_nate at 8:59 AM on June 25, 2008


cog_nate: I refer you to this thread, which devolved into a (ahem) discussion of whether the proper phrase is "another thing coming" or "another think coming."
posted by jedicus at 9:01 AM on June 25, 2008


Okay, seriously. Is this for real? I mean, for real, real? As in, isn't Congress just going to pass a bill tomorrow that says whoops, sorry about that, here's your copyright back?
posted by Faint of Butt at 9:07 AM on June 25, 2008


Awesome. What program/language should I learn to best explore these data?
posted by thrako at 9:12 AM on June 25, 2008


Stanford has a searchable version of the copyright renewals database (also based off the Project Gutenberg copy of the renewals). Probably easier than hacking an XML file yourself.
posted by fings at 9:22 AM on June 25, 2008


Most books published 1923-63 in public domain

Implies that books published before 1923 aren't public domain, which is technically incorrect.

Most books published before 1963, and are not on this list are in the public domain.
posted by blue_beetle at 9:30 AM on June 25, 2008


So, looking at a few famous titles published in 1962, I'm 3 for 4 so far (meaning 3 out of the 4 are on the list and thus still copyrighted).

Island, by Aldous Huxley - Not Copyrighted (grepped for <Title>Island</Title>)

A Wrinkle In Time, by Madeleine L'Engle - Copyrighted

The Man in the High Castle, by Philip K. Dick - Copyrighted

One Flew Over the Cuckoo's Nest, by Ken Kesey - Copyrighted
posted by symbollocks at 9:33 AM on June 25, 2008


> Stanford has a searchable version of the copyright renewals database (also based off the Project Gutenberg copy of the renewals).

There's room for more work because Stanford's only providing access to book titles (Class A registrations), and there are periodicals in the full list (for example, The New York Herald is the third record, with years worth of daily publications).

Google's XML file is also more recent than Stanford's if their datestamps can be trusted, which may or may not indicate that more records were added. I can't be bothered to check.
posted by ardgedee at 9:45 AM on June 25, 2008


Holy cow, what a terrific post. Thanks!
posted by blucevalo at 9:49 AM on June 25, 2008


Stanford's dataset is available for download however it appears to be very incomplete. There is no (zero) data for years 1923-49.

Stanford: 12.2 MB zip
Google: 56.5 MB zip

I did a rough count of Stanford's records for 1950-1963 and came up with about 50,000. Compare this with Googles 430+,000 records (1923-63). In addition Googles records have a lot more fields of data. I would not rely on Stanford. As Google (Jon Orwant) says "we believe this is the best and most comprehensive set of renewal records available today."
posted by stbalbach at 10:01 AM on June 25, 2008


What apps are you all using to peek in at the XML? It's a humongo 370MB+ file, and I'm crashing out everytime I try to use something like BBEdit to view it.
posted by avoision at 10:02 AM on June 25, 2008


Sweet Mother of Google.
posted by anotherpanacea at 10:06 AM on June 25, 2008


Grep it from the command line like this:

grep -i "<title>Some Title</title>" google-renewals-all-20080624.xml

If it returns results then it's still copyrighted.
posted by symbollocks at 10:07 AM on June 25, 2008


Someone should really make a web interface to this data, right? Like, I dunno, Google?
posted by Asparagirl at 10:23 AM on June 25, 2008


Great news—thanks!
posted by languagehat at 10:32 AM on June 25, 2008


Hot dog!!!!
posted by the dief at 10:35 AM on June 25, 2008


Okay, seriously. Is this for real? I mean, for real, real? As in, isn't Congress just going to pass a bill tomorrow that says whoops, sorry about that, here's your copyright back?

Oh, good lord no!

(Just don't make them to make this mistake again.)
posted by absalom at 10:47 AM on June 25, 2008


I bet within 24 hours of this FPP, the US Congress is going to pass a law making all works published from 4000 BC to 1963 AD retroactively copyrighted to the Disney corp.

We need to protect Hammurabi's Code from pirates!
posted by Avenger at 11:05 AM on June 25, 2008 [1 favorite]


most books weren't renewed.

Is this one of those things where it's "most books" just because there are so many damn books that no one cares about, and any book you actually want was renewed?
posted by smackfu at 11:10 AM on June 25, 2008


Thanks for posting this. I have some automative guides from the 1920s that could probably be used to build a vintage car from scratch and I was wondering whether or not it would be legal to make it available to other people.
posted by drezdn at 11:11 AM on June 25, 2008


any book you actually want was renewed

It depends on what type of books you're looking for. There's a publishing company (actually several depending on the subject) that specializes in reprinting out-of-print books on foundry techniques, metalworking books, etc. from the late 1800s and early 1900s. The information is only useful/interesting to a very small segment of the population, but to those people it's incredibly useful.

For the stuff the average person would find interesting (say popular literature), it's probably already pretty easy to track down a very inexpensive version. The exception would probably be lesser known works by major authors.
posted by drezdn at 11:16 AM on June 25, 2008


Well, it's not just personal consumption. Public domain works can be reinterpreted by new artists, so if for example Huxley's Island is available then anyone can make a comic or a movie out of it.
posted by clockworkjoe at 12:29 PM on June 25, 2008


> Public domain works can be reinterpreted by new artists, so if for example Huxley's Island is available then anyone can make a comic or a movie out of it.

Or, more importantly, somebody can take thematic swipes from Island and use them in their own work for the benefit of interpretation, rebuttal, revision, or other artistic filtering that helps further progress in the arts, and the swipes can remain matters of aesthetic judgement rather than law.

This goes far beyond the ability to make a movie without paying license fees.
posted by ardgedee at 1:04 PM on June 25, 2008


Maybe someone wants to hook this up with these folks:

http://publicdomainreprints.org/

C~
posted by MrChowWow at 1:16 PM on June 25, 2008


Woo hoo! Thanks, stbalbach!
posted by rtha at 2:12 PM on June 25, 2008


For the uneducated in these matters, (me), what would I do with this xml file and why is it so great? Not even sure how to read an xml file.

/self ban
posted by Senator at 5:26 PM on June 25, 2008


Senator: If you have a desire to do something with an old book, like make a movie, sell copies, include portions in some other work etc. you have to be sure of the copyright status of said book. If it is copyright, you must seek a release or license from the holder, but if it is in the public domain you can do what you wish.
Up to now, it could prove very difficult to establish the copyright holder for more obscure works, so you could not get a license, or be sure the work was out of copyright. This uncertainty usually meant that derivative works were close to impossible to get published, as publication of a copyrighted work (even in ignorance) could lead to legal action.
This list eliminates some of this confusion. If the work you wish to use is included, you will need a license or release. If it is not, then it is no longer protected, and is in the public domain.
This means there are now potentially millions of public domain works that can now be used.
Examples of the possible uses would be films (e.g like Disney makes films of fairy tales), uses in other works (e.g. a collection of depression era photos or recipes) and straight reprints - making information like old car manuals etc. available again. even things like that site with gross recipe photos.
posted by bystander at 12:42 AM on June 26, 2008 [1 favorite]


Just a quick reminder that the USA is not the world. Huxley's Island may be in the public domain in the US, but here in the UK it remains in copyright until 70 years after the death of the author, i.e. 1963 + 70 = 2033.

I take the point about remixes, mashups, etc, but isn't this a bit beside the point? I mean, this is part of the Google plan for mass digitization, right? I don't think Google is publishing this information out of the kindness of its heart in order to encourage personal artistic endeavour. By digitizing so many out-of-copyright titles, Google hopes to be in a stronger position to force publishers to agree licensing terms for in-copyright titles.

Incidentally, Microsoft recently cancelled its digitization programme and took down its Live Search Books service , thus leaving GoogleBooks without a serious competitor in this field.
posted by verstegan at 2:24 AM on June 26, 2008


Seconding Verstegan. For example, if you want to make a movie out of a book you're gonna need worldwide rights.
posted by unSane at 4:38 AM on June 26, 2008


I don't think Google is publishing this information out of the kindness of its heart in order to encourage personal artistic endeavour.

Uh-no:

1. Google didn't digitize the data. The 'Distributed Proofreaders' did - it's entirely an 'open source' project done by the same type of volunteers who made Wikipedia, Linux and other projects, people working in their spare time for the love of helping others.

2. The only role that Google played is that a single (1) programmer spent perhaps one or two days messaging the data into XML format to make it easier for others to access it. If Google hadn't done it first someone else would have eventually.

leaving GoogleBooks without a serious competitor in this field.

Not at all, the Open Content Alliance is still going strong, Internet Archive has been adding thousands of new books every day, 5 days a week. I'm actually glad Microsoft pulled out as they added irritating watermarks to the PDFs (every page). Plus it puts the OCA in a stronger position as a fully publically funded venture.

force publishers to agree licensing terms for in-copyright titles.

I don't follow the logic. How is this XML file going to force publishers to agree to re-license in-copyright titles? It's just a copyright renewal list that has always existed, now in digital format for end-users to access.
posted by stbalbach at 5:26 AM on June 26, 2008


Huxley's Island does appear in the file:
<Record>
  <CopyrightYear>1962</CopyrightYear>
  <RenewalYear>1990</RenewalYear>
  <File>web-1990-10.txt</File>
  <Recno>387204</Recno>
  <Lines>15520-15529</Lines>
  <MD5Sum>3e69a64108b11482a4d8b236d8e0ab34</MD5Sum>
  <Title>Island</Title>
  <Copyright>
    <Date>1962-03-28</Date>
    <Id>A00000552929</Id>
  </Copyright>
  <Renewal>
    <Date>1990-10-16</Date>
    <Id>RE0000494475</Id>
  </Renewal>
  <Contrib>
    <Name>Aldous Leonard Huxley</Name>
    <Role></Role>
  </Contrib>
  <Holder>
    <Name>Laura Huxley</Name>
    <Type>W</Type>
  </Holder>
  <Source>usco</Source>
<Snippet>
ID: RE0000494475
OCLS: Text
DREG: 1990-10-16
OREG: A00000552929
ODAT: 1962-03-28
CLNA: Laura Huxley (W)
TITL: Island.
Name: Aldous Leonard Huxley
Name: Laura Huxley
AUTH: Aldous Leonard Huxley.
</Snippet>
</Record>
posted by yz at 7:39 AM on June 26, 2008


Well, so much for publishing my Aldous Huxley's Island slash-fic.
posted by drezdn at 10:42 AM on June 26, 2008


A response to a Creative Commons note about the file announces that a rudimentary and preliminary (and possibly unreliable (at least insofar as searches seem to return a maximum of fifty results)) interface to the data does exist:
Elizabeth Townsend Gard
July 3rd, 2008 at 3:16 am

My research assistant, Matt Miller, a 3L student at Tulane Law School, put together this simple program to search the records. We are working on a more sophisticated tool for copyright duration that we should have completed in the Fall, but we thought we would to make this available for others now.

http://renewalrecords.urbanpug.com
posted by yz at 7:23 PM on July 4, 2008


« Older I just blue myself   |   Both happy office workers punching the air AND... Newer »


This thread has been archived and is closed to new comments