And 40 million pages of justice for all
October 29, 2018 5:29 PM   Subscribe

"The Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law Library. Our scope includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands. Our earliest case is from 1658, and our most recent cases are from 2018." posted by MonkeyToes (22 comments total) 57 users marked this as a favorite
 
I'm excited about this, I've idly wondered why something like it hasn't existed in the past but without any idea how someone would try to build it, and with whose help.

I saw it hit legal twitter etc this morning, and it was a nice bit of news in an exceedingly grim week.
posted by snuffleupagus at 5:59 PM on October 29, 2018 [1 favorite]


All the better to feed it into a big neural network AI looking for monetizable loopholes.
posted by sammyo at 6:12 PM on October 29, 2018 [3 favorites]


This will make a great addition to the archive of important texts I’ve been working on (the “Darkstarchive”) for the reconstruction effort in our post-apocalyptic world.*



*Only partly in jest.
posted by darkstar at 6:16 PM on October 29, 2018


Centuries ago, when Germanic nations, especially Iceland, held a Thing, they would gather and at the start of the proceedings the speaker would recite the entire law from memory. Anything he omitted, if not reminded by the audience, would no longer be law.

Case law works similarly. Rulings that deserve to be cited for precedent get cited and remembered, while others are consigned to be forgotten. So I'm not sure digitizing our whole corpus of case law is an unalloyed good.
posted by ocschwar at 7:08 PM on October 29, 2018 [6 favorites]


If you need proof that automation still can’t quite do everything, look no further than their limerick generator.

a pair of shoes, she said yes, a new pair.
This is not the correct standard of care.
chronological feat.
The Agreement Complete.
Prosecutions for perjury are rare.

posted by armeowda at 7:10 PM on October 29, 2018 [5 favorites]


Our scope includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands.

And this is my pre-posting-edited comment after I wrote a snarky comment about this being only Federal stuff and not all the other stuff, but then I clicked on links and that's what I found.

#TeamClick
posted by hippybear at 7:21 PM on October 29, 2018 [3 favorites]


the archive of important texts I’ve been working on (the “Darkstarchive”)

2250, around a campfire on a vacation from vast global civilization rebuilt from the ashes:

"And that's how the Heads of the Sacred Dead did show their Gratefulness toward future generations..."
posted by hippybear at 7:24 PM on October 29, 2018 [4 favorites]


+1 for the post title
posted by slater at 7:28 PM on October 29, 2018 [1 favorite]


So I'm not sure digitizing our whole corpus of case law is an unalloyed good.

That already happened. Not having access to that aggregated corpus gated behind around $500/mo in fees (for a subselection of jurisdictions) and revocable at the pleasure of ThomsonReuters or Lexis seems to me rather unalloyed.

It's probably better for old cases to be actually be explicitly overturned (and flagged as such by citation management) than merely allowed to fall into disuse, anyway. We've been working towards that since Shepard's Citations.
posted by snuffleupagus at 7:50 PM on October 29, 2018 [11 favorites]


omg I luuurrrrve the limericks. But I think I grin at AI trying to create art in the same way I 'aww' at pets who think they're people.

Has anyone given much thought to the choice of CAP as the name? i.e. switching domains from law into distributed systems, where CAP stands for Consistency, Availability, Partition Tolerance, and there is a theorem that you can have at most two of the three...

(also, it looks like Jessamyn worked on this?)
posted by batter_my_heart at 12:05 AM on October 30, 2018 [1 favorite]


*stands, doffs cap, and casts eyes downward at the final sign of West Publishing’s passing*
posted by wenestvedt at 3:02 AM on October 30, 2018 [2 favorites]


So it's rate limited to 500 cases per person per day (except for cases from illinois and arkansas), except for researchers or those who pay for commercial data. So if no new cases were added, it would only take 34.6 years to download all of these as an individual, not including the exceptions above.

Case law isn't copyrightable though, at least such is my understanding (IANAL so correct me if I'm wrong). There's nothing in the API terms of use that prevent you from aggregating those 500 cases with other people.

This is much less onerous then I would expect.
posted by gryftir at 4:16 AM on October 30, 2018 [1 favorite]


After playing with it some, it would seem the underlying case database is coming in part from Ravel, which is a commercial service (with ties to academia IIRC). That probably explains some of restrictions.

The casebook feature is cool too. Check out the torts “playlist” by mefi’s own Zittrain.
posted by snuffleupagus at 5:26 AM on October 30, 2018


There is also RECAP "an online archive and free extension for Firefox and Chrome that improves the experience of using PACER, the electronic public access system for the U.S. Federal District and Bankruptcy Courts. .... Once installed, every docket or PDF you purchase on PACER will be added to the RECAP Archive. Anything somebody else has added to the archive will be available to you for free — right in PACER itself."

I haven't yet RTFAed but perhaps these two groups can work together.
posted by exogenous at 5:30 AM on October 30, 2018 [2 favorites]




snuffleupagus, people DID have this idea previously and I worked a bit on it -- AltLaw was an effort out of Columbia University's Law School, the brainchild (as I understand it) of Paul Ohm and Tim Wu, spearheaded on a programming level by Stuart Sierra. I met Stuart at a Lisp meetup in 2006 and got to learn more about his work on AltLaw, and contributed some product management advice -- I suggested that we put together online casebooks for the common cases covered in first-year law courses.

AltLaw decided to close up shop when Google Scholar started adding case law to its coverage in late 2009. AltLaw's tiny effort couldn't compete.

I'm glad to see Harvard step up here, and I hope it keeps the site going, and I hope it's very easy for researchers to get exemptions from the download limits. And, from the About page:
Access limitations on full text and bulk data are a component of Harvard’s collaboration agreement with Ravel Law, Inc. (now part of Lexis-Nexis). These limitations will end, at the latest, in March of 2024. In addition, these limitations apply only to cases from jurisdictions that continue to publish their official case law in print form. Once a jurisdiction transitions from print-first publishing to digital-first publishing, these limitations cease. Thus far, Illinois and Arkansas have made this important and positive shift and, as a result, all historical cases from these jurisdictions are freely available to the public without restriction. We hope many other jurisdictions will follow their example soon.
So I also hope other jurisdictions switch to digital-first, as long as I am not missing something that makes digital-first way worse on other dimensions.
posted by brainwane at 6:02 AM on October 30, 2018 [1 favorite]


Hi, all! I'm a developer on this project -- happy to answer questions if I can.

After playing with it some, it would seem the underlying case database is coming in part from Ravel, which is a commercial service (with ties to academia IIRC).

As a thumbnail sketch: Ravel funded this project; we did the scanning at the Harvard Law Library; Ravel got a copy of the data and we got a copy of the data, with some limited time commercial limitations in exchange for Ravel's funding; we're sharing our copy of the data at case.law, and building stuff to encourage interesting non-startup-motivated uses of it. The deal also requires Ravel to share free access on its own site. Ravel was later purchased by Lexis Nexis, which doesn't affect things much in practice since the agreement applies regardless.

I haven't yet RTFAed but perhaps these two groups [case.law and free.law] can work together.

We're big fans of the Free Law Project and talk with them from time to time -- will definitely work together if we can. Right now we're focused on polishing the data set we have, which leaves tons and tons of room for other open law work.

Rulings that deserve to be cited for precedent get cited and remembered, while others are consigned to be forgotten. So I'm not sure digitizing our whole corpus of case law is an unalloyed good.

This is a fun idea -- and it's 2018 on the internet, so I broadly agree with this point and can't wait to see the terrible downsides of what we've done. But just as another perspective on the value of old caselaw, I leaned on cases that were over a hundred years old when I was arguing a juvenile grand jury case in Massachusetts -- that was the last time that the court really recognized the protective function of grand juries. And the existence of those old cases gave the modern court the foundation it needed to revive the idea that grand juries should serve a protective function again. I'm a little skeptical of the general idea that rulings that deserve to be remembered necessarily will be by a given generation of judges.

(also, it looks like Jessamyn worked on this?)

Jessamyn didn't directly work on this project, but was a fellow of the Harvard Library Innovation Lab for a while, as a generally cool library person we wanted to support. I'm not sure we succeeded in that, but A+ would share fellowship again.

I grin at AI trying to create art in the same way I 'aww' at pets who think they're people.

Oh, for sure -- and this isn't even AI in the loosest sense. I just indexed all sentences of American caselaw by rhythm and end rhyme so I could randomly flow them into poetry forms, as one does.

What I'm hoping will come out of projects like that is broader ways to think about this collection of 6.5 million cases that no one person has ever read -- how do courts talk about dollar figures over time, how does treatment of typically male or female names change over time, what are the most or least common words, when do words start and stop being used, what weird patterns show up in the citation graph, and all that digital humanities stuff.

I suggested that we put together online casebooks for the common cases covered in first-year law courses.

Nice! A related project we're building, opencasebook.org / H2O, was actually one of the inspirations for starting this scanning project, years ago.
posted by john hadron collider at 6:47 AM on October 30, 2018 [15 favorites]


*stands, doffs cap, and casts eyes downward at the final sign of West Publishing’s passing*

Without wishing to in any way denigrate the value of this project, because I strongly believe that caselaw should be available to everybody, as a tool for working lawyers (i.e., the people who pay for Westlaw Next or Lexis Advance) it will be limited. You do legal research using sophisticated search tools or, if you're lazy, following up the links Westlaw or Lexis/Nexis generated between cases that cite each other or touch upon the same topic. Not just by looking up a case, or just searching a text string. (Basically, they've seen this coming for a while now and doubled down on adding value.)

case.law is actually blocked by my office Internet filter, oddly enough, so I can't look at their tools (looking forward to doing so tonight), but I doubt they're the full equivalent. Again, not at all meant as a slam on the project. It was big enough to begin with!

I suggested that we put together online casebooks for the common cases covered in first-year law courses.

Anyone could do this, though (and some people have gone to inexpensive self-published casebooks already), because, except for a few edge cases, copyright isn't the real issue. Selection and commentary are the value in a casebook. You need a particular ideological commitment to do that for free.
posted by praemunire at 10:15 AM on October 30, 2018


What I'm hoping will come out of projects like that is broader ways to think about this collection of 6.5 million cases that no one person has ever read -- how do courts talk about dollar figures over time, how does treatment of typically male or female names change over time, what are the most or least common words, when do words start and stop being used, what weird patterns show up in the citation graph, and all that digital humanities stuff.

To be honest--and I'm an ex-historian, more or less, so this is also not a slam--this may well end up in some ways being of more use to historians than to lawyers. Which, good. Legal history is a mess and needs all the help it can get.
posted by praemunire at 10:16 AM on October 30, 2018


I would think that one of the things someone looking to disrupt might do is pilot a citation service built on top of it, to launch commercially once the restrictions are lifted?

Some lawyers make do with FastCase and whatnot -- because they have to. Which does not claim to be an authoritative citation service.
posted by snuffleupagus at 10:18 AM on October 30, 2018


Yes, "more use to historians [who can program]" is definitely a good way to understand this thing so far. In the long run the raw data can power all sorts of stuff, but as a handful of programmers in a law library we can't really compete with commercial databases and don't want to -- we're focused on building the opposite of whatever is going to be built by legal tech startups or paid for by lawyers.

Honestly I'd love for this to drive general interest in caselaw outside of law in general. It's such a rich source of historical stories about the moral choices of groups of people.

snuffleupagus, CaseText took a stab at a crowdsourced citator with their WeCite project, though it isn't currently operating. Clever new ways like that to try to overcome the moats around the big databases are definitely intriguing.
posted by john hadron collider at 11:35 AM on October 30, 2018 [1 favorite]


Grr, argh. The Darkstarchive has been thwarted.

So, I’m trying to find a relatively straightforward way to simply download all of the court decisions from a jurisdiction. Say, for example, all SCOTUS cases. Or all 9th Circuit cases. It’s not yet obvious to me how I can do that.

It doesn’t appear that SCOTUS cases are available through bulk download unless I’m...a legal scholar? Does that mean I’d have to download them one at a time, and only after I set up an account?

And I confess I don’t even know how to use the API...is that something I’d have to incorporate into my own program to be able to run?

I was hoping for something like a Gutenberg Project of downloadable case files in txt format for all of the cases I’m interested in, but this just feels like the information is just as locked away to me now as it was before. What am I missing?
posted by darkstar at 1:06 PM on October 30, 2018


« Older The pursuit of physical swoleness is virtuous and...   |   A Grim Education: 72 Years of School Shootings Newer »


This thread has been archived and is closed to new comments