Where do bad links go when they die?
June 30, 2021 9:25 AM   Subscribe

Quarantine, but for information. Harvard Law and The Blue's Jonathan Zittrain makes the long, strong case that everything on this mutable, decaying, glorious mess of an Internet is worth a heroic effort to save, even the things that are wrong and bad and should largely be, but not completely, forgotten. Sub-point: tip your librarians.
posted by leppert (18 comments total) 16 users marked this as a favorite
 
Rebuttal:
goat---
posted by Nanukthedog at 10:44 AM on June 30, 2021 [3 favorites]


I, too, often regret entropy, but information storage is not free. The environmental impacts alone of “storing everything” would be significant, to say nothing of the impacts that I shot a fox in Skyrim and it made me sad discusses.
posted by cupcakeninja at 10:48 AM on June 30, 2021 [3 favorites]


The first study, with Kendra Albert and Larry Lessig, focused on documents meant to endure indefinitely: links within scholarly papers, as found in the Harvard Law Review, and judicial opinions of the Supreme Court. We found that 50 percent of the links embedded in Court opinions since 1996, when the first hyperlink was used, no longer worked. And 75 percent of the links in the Harvard Law Review no longer worked.

I remember hearing this statistic in the Harper's Index in 2014 about a huge amount of links in supreme court decisions no longer functioning, and it has always stuck with me since then. Thanks for sharing this article!
posted by Corduroy at 11:05 AM on June 30, 2021 [5 favorites]


A good intro to the link rot problem.

Bravo for the Internet Archive!
posted by doctornemo at 11:19 AM on June 30, 2021 [3 favorites]


I remember hearing this statistic in the Harper's Index in 2014 about a huge amount of links in supreme court decisions no longer functioning, and it has always stuck with me since then. Thanks for sharing this article!

I feel like the "link rot" problem is kind of similar to that in more ways than one. Many laws and precedents on the books are so out of date and so completely from a different era and a different way of thinking. Frankly, a lot of law and precedent needs to be "forgotten" so maybe the country can start fresh and dump the systemic racism/ableism/classism inherent in the legal system.

There's also that whole thing where it's actually really important and healthy for humans to be able to forget things.

Holding on to things forever is damaging and a hallmark of conservatism.

On the other hand, preservation of art, history, math and science is of the utmost importance, because everything we know is built on those. Modern history stands on the shoulders of giants, all those who came before and compiled all this information for us to use now, and as such, keeping current information safe for the future is paramount.
That was me then, it is not me now.
I think this is one of the most important takeaways here, when it comes to the endless permanent cataloging of personal information. People can grow and change, and we shouldn't be punishing them for things they did decades ago. This is also the problem with so-called "social credit" systems where it haunts you the rest of your life and you're never really "forgiven" for past transgressions. Much like our criminal justice system, it's treating life like a black and white slate, you're either a criminal or you're not. We don't really give people truly meaningful opportunity to change within the US incarceration system, and a world where we can't forget any of our past transgressions is one where we effectively can't move on.

The other day I was thinking about South Park and suddenly I remembered that Trey Parker had named numerous characters who are shown in a negative light "Leane/Liane." I recalled an interview where Parker had mentioned this was a former lover who had cheated on him. Leane, whoever she was, has effectively been harassed her entire adult life and been compared to a horse and literally the biggest whore in South Park (Cartman's mom, Liane) because she upset the wrong fucking guy and he got a TV show and decided to make fun of her as long as he could with his platform. He refused to let her forget, he did not allow her to move forward and change, instead always trying to remind her of her past transgressions against him. (Of course, she could just not watch, but the point is,its still pointed harassment from a jilted lover.) I can't even imagine being harassed by a TV show, a cartoon no less. It used to just be shitty powerful men like Parker who could do this, but now any group with an internet connection can, and those groups can dig up things that most people forgot about you.

There's a reason the phrase is "Forgive and forget." Not letting someone forget can often be abusive and harassive behavior.
posted by deadaluspark at 1:00 PM on June 30, 2021 [3 favorites]


Holding on to things forever is damaging and a hallmark of conservatism.

Digging up things from the past to use against your enemies is the hallmark of liberalism. It's the same playbook both ways, like a mad-lib. Just changing the blank spaces for different nouns or verbs or adjectives. You can't blame people for their past and even their ancestors past in the same breath as positing that holding on to things forever is the hallmark of conservatism.
posted by zengargoyle at 2:12 PM on June 30, 2021 [1 favorite]


I have my Twitter account set up to auto-delete Tweets when they hit 14 days old. I have never in the several years I've been doing this wished I had an old Tweet. On the odd occasion I go viral or get retweeted by a celebrity or whatever, I screenshot the tweet and throw it on my blog for posterity.

And the blog has no online CMS - it's created offline with a static site generator and rsynced to a web hosting account that doesn't even have php available. It's just a bunch of HTML files - so as secure as any blog can be.
posted by COD at 3:04 PM on June 30, 2021


Hmm. Interesting. But I think this bundles together a bunch of topics that I'd want to separate when I'm thinking about it.

There's the permanence of "published" writing--that is, someone writes something that they would have wanted to put in a library back in the day. Books and high quality or journal articles. There's some stuff I hadn't through included in this, and basically yes: it might be nice to guarantee an archive. But I know journals partially address this with DOIs, which I was surprised to see not even mentioned. Clean up citing practices for these works, if they are being linked poorly; maybe some Library of Congress type rules that centrally preserves them.

Then there's midrange stuff out of content farms. There's a lot written that everyone involved (including the author) that isn't "worth" saving. Stuff on sports, the stock market, and TV springs to mind. Not that there's not a lot of good writing on these topics--I'm specifically referring to by-the-numbers pieces driven by the need for clicks. How many versions of "Here's what's leaving Netflix in July" do we need eternally preserved.

Then there's ephemera, most tweets, blog posts, facebook entries, dating profiles. I note that's what I first thought of from the headline and seems to be top of the mind for commenters here too.

I know people who are historians, sociologists, linguists, etc. have professional incentives to want all this forever saved, but I think it's fine to let them go. Even if they are referenced in Supreme Court decisions.

If we focus on keeping the non-controversial first lot we are doing so much better than ever before in human history.
posted by mark k at 3:33 PM on June 30, 2021 [2 favorites]


Hello! Thanks for these thoughtful comments on the piece. (I'm the author!)

And I think you're right, mark k, that it makes sense to unbundle the kinds of things worth saving. But part of my worry is that those divisions are becoming fuzzy. The prime category for long-term saving that you mention -- books and high quality and journal articles -- are themselves being only "rented" rather than owned by libraries and readers, so that they can vanish if the publishers go out of business or change business models, and they can also be changed in place, with no easy way to guarantee their integrity. Allowing libraries to once against store what they collect, rather than merely arrange for evanescent access rights on behalf of patrons, seems vital. And that's as much a political and policy question (and I guess economic one) as it is a technological one.

DOIs are great for what they do, but they don't end up storing a copy of anything anywhere -- they're instead a labeling and disambiguation scheme so that when you tell me you mean to refer to "that article" you can convey exactly which article you mean. But the DOI alone doesn't mean the article can be found (or will stay) anywhere in particular.
posted by zittrain at 6:30 PM on June 30, 2021 [12 favorites]


When it comes to the question of what to save and what not to save, all I know for certain is that corporations (who typically make decisions regarding the latter) cannot be trusted with that decision, ever.
posted by BiggerJ at 7:22 PM on June 30, 2021 [2 favorites]


The prime category for long-term saving that you mention -- books and high quality and journal articles -- are themselves being only "rented" rather than owned by libraries and readers

I garbled some of my comment, but for sure this is a key point I had not thought through carefully until I read your piece.

My lived experience is that it's much easier to trace down sources now. We don't get much link rot in the sciences (in part due to DOIs), and I can look up almost any footnote with a click; 30 years ago it was insanely expensive to get anything not in hardcopy form in our tiny corporate library.

But that access is illusory if conditions change.
posted by mark k at 10:58 PM on June 30, 2021 [2 favorites]


I have my Twitter account set up to auto-delete Tweets when they hit 14 days old. I have never in the several years I've been doing this wished I had an old Tweet.

I am the opposite. I tweet a lot, and sometimes write short essays, and find it's very difficult to use Twitter's search tools to go back and find stuff I've written before.

I have often heard it said that, once it's on the internet, it will live forever. I have known this is not true for many years. Link rot is my eternal bane.

The Internet Archive, while great, is not good for searching through its massive archive. Google doesn't crawl the Wayback Machine, so if you don't know where it was before it went dark, it is very challenging to find it again. I am encountering this right now pretty hard, since some classic gaming websites, especially those to do with old arcade games, have gone dark.

Like, some years ago I linked The Pac-Man Dossier from these pages, but the site itself is dead now. (Fortunately, the content is preserved in a Gamasutra article. I believe I was the one who pointed the Dossier out to Simon Carless, who was working at Gamasutra at the time, so there is a chance if I had not noticed it then, it would just be gone now.) There is also a set of alternate levels for the arcade game Gauntlet II, which were I think made by an old Atari employee, Glenn Mandelkern. They used to be available on his website, and they're mentioned in Ed Logg's GDC talk from 2012, but that site is dead now, and as far as I know they survive nowhere on the internet.

This is only going to get worse over time.
posted by JHarris at 1:12 AM on July 1, 2021 [2 favorites]


Ephemera is a complicated category. I think in many fields oh historical study individual pieces of ephemera are incredibly valuable precisely because they're the sort of thing that was not often saved. The few pieces that made it to us rise in value accordingly. It's hard to know what the value of large-scale amounts of ephemera will be to researchers in the future, because we don't have a good precedent for it except in studies of the very recent past.

We already have much more of this kind of thing saved by people like the Internet Archive then was ever saved in historical eras, and good reason to expect it will be maintained. I'd love to see what researchers in a few hundred years get up to with it.

IA has a lot of holdings beyond what they make publicly available, and I think there's a great case for privately archiving say, the ephemera of social media, but only making it available once sufficient time has passed. As pointed out above, "sufficient" may need to be quite long. But that doesn't mean it's not worth doing.

We could also stand to get better, as a society, at how we interpret social media posts, especially older ones. That hasn't shown much sign of happening yet, but I live in hope.

But this an altogether different issue than what the article talks about, and I think mark k's schema is reasonable.
posted by vibratory manner of working at 3:22 AM on July 1, 2021 [1 favorite]


My personal archiving recommendation, which is not link based: whenever you download some piece of software, save the installer. Digital distribution means that large amounts of our software landscape will be lost. Games are probably safe, there's a lot of effort going on in that space. Other kinds of software? Not so much. You will probably end up with the only copy of a specific version of *something*. Upload it to IA eventually.
posted by vibratory manner of working at 3:28 AM on July 1, 2021 [2 favorites]


There's a lot written that everyone involved (including the author) that isn't "worth" saving. Stuff on sports, the stock market, and TV springs to mind.

One man's trash....

That said, the question of too much and not enough is all but impossible to balance. Plenty of lost Greek and Roman material we would love to have, far more renaissance (never mind modern) material than anyone can ever hope to use. It would seem inevitable that with the advent of wood based paper ca 1840 and the fragility of electronic media, that there will be a huge knowledge gap on Our Times for the curious in the 2500s.

There's a reason the phrase is "Forgive and forget." Not letting someone forget can often be abusive and harassive behavior.

Words to live by.
posted by BWA at 5:53 AM on July 1, 2021 [1 favorite]


I work in a special collections and rare books department of a public library, and almost every day I make "keep or discard?"-type decisions about various types of materials, so I'm a direct agent of entropy. It can be difficult in a "one person's trash" way and melancholy in the sense that much of what I decide not to keep is effectively disappearing from the historical record, but ultimately decisions have to be made based on various criteria (not the least of which is the amount of storage space available) and I try to make the best decisions I can based on the information I have. I have colleagues who get very anxious about the prospect of discarding anything (my predecessor was definitely towards the "keep everything" end of the scale), but I've come to think of my role at the library as someone who continually has sand running through his fingers and scoops as much of it as he can into a bucket, and I don't get too upset about what I miss.

Anyway, I don't have anything to do with digital collections (the archiving of which seems like a real hedge maze, and I'm glad I don't have to make those decisions), but I think about entropy a lot at work.
posted by The Card Cheat at 8:50 AM on July 1, 2021 [2 favorites]


Card Cheat, would it be possible that in certain cases when you prune a item, that you are *decreasing* entropy? If we think of entropy as disorder. There's a subjective evaluation of the disorder in a collection, but aren't there items whose contents are the same or very similar to other things? And "never" is a long time, but if we decide something will "never" be useful and we happen to be right, then entropy is once again decreased?!
posted by storybored at 10:57 AM on July 4, 2021 [1 favorite]


That’s what I like to think, yes.
posted by The Card Cheat at 8:36 PM on July 5, 2021


« Older “We Sing This Song For You, Wuhan!”: A Short...   |   God in Love Unites Us Newer »


This thread has been archived and is closed to new comments