The Unbearable Lightness of Web Pages
February 24, 2016 7:00 AM   Subscribe

Web pages are ghosts: they’re like images projected onto a wall. They aren’t durable. Contrast this with hard-copies—things written on paper or printed in books. We can still read books and pamphlets printed five hundred years ago, even though the presses that made them have long since been destroyed. How can we give the average independent web writer that kind of permanence? Joel Dueck on building a website with Matthew Butterick's Pollen, allowing it to also be published as a printed book.
posted by Cash4Lead (45 comments total) 19 users marked this as a favorite
 
The second picture on that page is an animated gif. The third thing on the Pollen page is a youtube video.

Try buying a bestseller from the 80's. Can you even find a copy? How about the 30's? This idea that paper is some wellspring of permanence is complete nonsense. Books endure because of libraries and librarians, not because of paper.

Why not offer to just carve your site in a monolith, if you're going to genuflect to old media like that?
posted by mhoye at 7:07 AM on February 24, 2016 [19 favorites]


What if we print web pages out onto canvas?
posted by Mr.Encyclopedia at 7:11 AM on February 24, 2016 [1 favorite]


Books endure because of libraries and librarians, not because of paper.

Seriously. I know a couple of people who do historical preservation work, and they do downright Herculean labor to make sure that artifacts - especially paper artifacts - stay in some sort of condition where they can still be studied or appreciated. I see this "hypertext is fleeting! Paper is forever!" mentality pop up every now and again and I always wonder if the people espousing it have ever spoken to any of the people whose job is to make paper last as close to forever as possible.
posted by Itaxpica at 7:22 AM on February 24, 2016 [9 favorites]


"Try buying a bestseller from the 80's. Can you even find a copy? How about the 30's? T"

Well, from the 1980s? Yeah, quite easily. I do it all the time. I buy William Gibson's 1980s works like lone gunmen buy "The Catcher in the Rye".
posted by I-baLL at 7:22 AM on February 24, 2016 [5 favorites]


And, just as an FYI, there are used book search engines that make old books easy to find and buy (though the price may not always be cheap): http://used.addall.com/
posted by I-baLL at 7:24 AM on February 24, 2016 [1 favorite]


How much paper would it take to legibly print out everything that has ever been on the internet?
posted by ernielundquist at 7:30 AM on February 24, 2016


Look at how many ancient texts have survived only due to the tireless efforts of generations of monks copying old manuscripts. If it's longevity you're after you're better off sealing an SD card in lucite and burying it.
posted by Mr.Encyclopedia at 7:31 AM on February 24, 2016


The Internet is not the medium of permanency. It is disposal, and why nonfiction does very well: it is need to know and discard, the way a student studies for an exam. Nonfictional stories, for instance, don't do well on this medium because it requires one to immerse in a different world. This medium is too wound up and frenetic for that.

It is something writers/authors/journalists et al have to accept. Web pages, no matter how pretty, thrive on novelty, and then what else do you have?

And as the poster above mentioned, paper does not guarantee permanency, either, but it does remind me of the fact that we know more about older societies that had recorded on stone tablets than the newer ones that recorded on papyrus and had their records rot on us. We are now the most chronicled generation in the history of mankind, and the chances everything online will be wiped out is very great. There will be people whose entire life history will be erased out of existence.

Every medium has its upsides and downsides. The Internet gives everyone a chance to be seen, but not so much to be remembered. Know your medium and act accordingly...
posted by Alexandra Kitty at 7:31 AM on February 24, 2016 [4 favorites]


And none of this is to say that there aren't interesting questions worth exploring about the permanence of text in a world where it's generally published on blogs whose software you didn't write, running on hardware you don't own, in a world where internet access can be restricted or data encodings can change. But it seems to me like a better way to encourage your internet writing's survival is to buy a cheap computer, set up a solid-state RAID array, and run it as a lightweight FTP server with plaintext versions of your writing. Back it up nightly to several external drives, at least a few of which are in other locations than the machine itself. Boom, you've just ensured the continuity of your work way better than writing it on dead trees would.
posted by Itaxpica at 7:32 AM on February 24, 2016 [1 favorite]


Why not offer to just carve your site in a monolith, if you're going to genuflect to old media like that?

Dumb Cuneiform is there for you!
posted by leotrotsky at 7:34 AM on February 24, 2016 [2 favorites]


Try buying a bestseller from the 80's. Can you even find a copy? How about the 30's? This idea that paper is some wellspring of permanence is complete nonsense. Books endure because of libraries and librarians, not because of paper.

I took this challenge! I looked up the Publisher's Weekly bestseller lists for the 1930s, and chose 1931 because it had the most books on it I hadn't heard of. Here are the results for the top ten bestselling books of 1931:

The Good Earth by Pearl S. Buck: Available in many editions; still in print

Shadows on the Rock by Willa Cather. Available in many editions; still in print

A White Bird Flying by Bess Streeter Aldritch. Some used copies available. Most recent edition, 1988, is still in print by University of Nebraska Press.

Grand Hotel by Vicki Baum. Used copies available for under $10. New edition forthcoming in June 2016.

Years of Grace by Margaret Ayer Barnes. No longer in print. Very few used copies available, with prices inflated into the hundreds of dollars by booksellers' algorithms. Least expensive option at $42 is a fancy commemorative edition published in 2007.

Bridge of Desire by Warwick Deeping. Used hardcovers available at $2.95.

Back Street by Fannie Hurst. Vintage Movie Classics paperback published in 2014 available for $9.84.

Finch's Fortune by Mazo de la Roche. Copies of a 2007 re-issue available for $24.95. Older copies available used for less.

Maid in Waiting by John Galsworthy. Available used for a penny plus shipping.



I admit that it's ironic that internet bookselling makes old books easy to find, when you'd otherwise have to comb through many dusty used bookstores to find a copy. But it's the endurance of these paper copies in those dusty bookstores and great-grandparents' bookshelves is what makes that possible.
posted by not that girl at 7:35 AM on February 24, 2016 [22 favorites]


If it's longevity you're after you're better off sealing an SD card in lucite and burying it.

make sure and bury something that reads SD cards, too.....and whatever infrastructure that device needs to work
posted by thelonius at 7:39 AM on February 24, 2016 [12 favorites]


But it seems to me like a better way to encourage your internet writing's survival is to buy a cheap computer, set up a solid-state RAID array, and run it as a lightweight FTP server with plaintext versions of your writing. Back it up nightly to several external drives, at least a few of which are in other locations than the machine itself. Boom, you've just ensured the continuity of your work way better than writing it on dead trees would.

Printing it once on dead trees is more realistic for a person like me.

My own desire for my writing's longevity is that some of it be available to my children, and maybe my grandchildren, someday, without them having to wade through the millions of words of blog posts, letters, journal entries, and more, that are stored on my computer. Going through it would be a life's work, and I'm not the kind of writer who would make that a worthy undertaking. I'd like to print selections of the good parts for them, hoping that would be less overwhelming so that they might actually look into them.
posted by not that girl at 7:40 AM on February 24, 2016 [3 favorites]


If it's longevity you're after you're better off sealing an SD card in lucite and burying it.


And the SD card reader, and a computer that connects to the reader and interprets the contents, a screen that can hook up to that computer, instructions on how to use these devices, etc.

Digital archiving is hard.

Presevering paper is hard, but you only have to preserve the item itself, not an entire technology stack needed in order to use that item. Anyone coming along later can generally figure out how to use a book on their own without special assistance.*

Given that they know the language of the content, but that's true of the SD card as well.
posted by tofu_crouton at 7:41 AM on February 24, 2016 [10 favorites]


Printing it once on dead trees is more realistic for a person like me.


Printing one copy isn't preservation, though - what happens when you lose it in a flood? The point of the system I set out is that it makes it as hard as possible for the material to be lost forever while (despite all the acronyms) being fairly easy for a normal person to set up: if you know enough about computers to successfully post on metafilter you know enough to establish the setup I described with a few hundred bucks and a few hours of Googling.

To get in to actual preservation of the work on paper you need to print a bunch of copies, store them safely in different places, and so on; and at that point you're pushing more complexity than the comparable digital setup.
posted by Itaxpica at 7:46 AM on February 24, 2016


> I see this "hypertext is fleeting! Paper is forever!" mentality pop up every now and again and I always wonder if the people espousing it have ever spoken to any of the people whose job is to make paper last as close to forever as possible.

I see the contrarian "paper sucks! hypertext rules!" mentality a lot, especially on MetaFilter, and it's bullshit. I have a book printed in 1507 and it's in fine shape (aside from the margins, see below), and it wasn't carefully preserved by archivists, I bought it in a used book store and it had obviously been in private hands all along (some asshole cut it down so that you can't read most of the marginal comments). Tell me how much of today's online material you expect to be available in 500 years. Be honest, now.
posted by languagehat at 7:46 AM on February 24, 2016 [11 favorites]


People have a misconception that making a digital copy ensures permanence. Sometimes this is jsut because they are ignorant, like they think that CR-RW discs are usable for long-term storage (they really aren't). Even if media is durable then there are very often issues with reading it. At my work we already see people who have great difficulty recovering data from media like Zip drives. While it seems unlikely that something like the RAW format will be replaced soon, when trying to preserve things like family pictures for generations, is that good enough?

Digital archiving is indeed something that people are going to need to work at very consciously.
posted by thelonius at 7:48 AM on February 24, 2016 [2 favorites]


In the '80s they transferred the Domesday Book to optical disk to preserve it forever, paper being so fragile and all. Within 20 years nothing could read the disk, while of course the thousand year old book survived.

I get the point on actual preservation being a problem, but at the same time I have eighty year old books floating around various bookshelves in my house. Not "permanent" but certainly durable in a way web pages historically haven't been.
posted by mark k at 7:48 AM on February 24, 2016 [4 favorites]


Yeah I regularly buy monographs from the 50s-90s that survived not by librarians' commendable efforts but merely sitting on some shelf in someone's collection.

That said, I also have plenty of digital material from at least the 80s-90s; digital format shifting is bad, but not that bad.

The problem with the web isn't just digital format shifting. Old web pages would be viewable if they were still available. The problem is sites aren't distributed in full to the reader in a stable aggregate form, only page at a time and ephemerally. So full backups don't get made casually, and then service of the page ends and oops nobody made a copy. It's more like trying to retrieve a telephone conversation from the 50s than a book.
posted by ead at 7:56 AM on February 24, 2016 [4 favorites]


How can we give the average independent web writer that kind of permanence?

Everything is relative.
posted by BWA at 8:01 AM on February 24, 2016


I have a book printed in 1507 and it's in fine shape

OK, great, but putting aside your one anecdotal book, what percentage of works from 1507 still exist today in a state that is usable by anyone?

The thing that always gets me about the "digital will all suddenly disappear!" crowd is that they seem to ignore the fact that the difficulty of keeping something usable has been just as true of non-digital copies historically. For example, there are majorly important philosophers, people who we know wrote dozens (or hundreds) of important works (from various evidence), and who were massively important in their times, and yet all we have now of their entire life's work a thousand years later is a handful of quotes from later sources, or maybe a tiny sliver of a percent of what they created.

Is digital preservation effortless? No, but that doesn't make it impossible (or necessarily worse than what has come before).

Similarly, it's all well and good to wonder how many family photo collections will truly be saved by digital formats for the centuries. However, it's rather one-sided to not also reflect on the fact that in the grand scheme of history, practically nothing of any given family's history (statistically) has been really saved, other than often a couple scraps out of context. You probably don't have your great great great great grandfather's journal currently. That's not that much different from your great great great great grandchildren maybe not having the archive of every selfie you took.

Anyways, the point is: preservation in general takes work. The fact that digital preservation takes work doesn't make it particularly different from non-digital.
posted by tocts at 8:07 AM on February 24, 2016 [3 favorites]


My grandfather was a carpenter. He build houses with his hands, houses that are still standing decades later, houses that families grew in.

I'm a web developer. I move ones and zeros around. I build web sites and apps that people might use and might last a couple of years, but they'll never be anyone's home.
posted by kirkaracha at 8:13 AM on February 24, 2016


> For example, there are majorly important philosophers, people who we know wrote dozens (or hundreds) of important works (from various evidence), and who were massively important in their times, and yet all we have now of their entire life's work a thousand years later is a handful of quotes from later sources, or maybe a tiny sliver of a percent of what they created.

Yes, because they were only available in a handful of manuscripts. That's the whole point of the Gutenberg revolution: suddenly the same book was available in thousands of copies, and it would take a whole lot of bad luck rather than one library fire to make them all vanish. My copy of Quintus Curtius is far more likely to be still around in another 500 years than your favorite web page.
posted by languagehat at 8:15 AM on February 24, 2016 [2 favorites]


Twee paper fetishism aside, there does seem to be a long-standing problem of designing systems for media-agnostic writing. I'm stuck with a thesis in PDF and LaTeX source largely because the macros used don't work with any of the SGML-based export systems. (For that matter, I suspect that some of the code used to build vanilla letter-size PDF from my source isn't even available anymore.)

Digging behind the Pollen link to the actual project page has a different focus:
The core idea of Pollen is an argument:
  1. Digital books should be the best books we’ve ever had. So far, they’re not even close.
  2. Because digital books are software, an author shouldn’t think of a book as merely data. The book is a program.
  3. The way we make digital books better than their predecessors is by exploiting this programmability.
Which strikes me as a more reasonable set of goals than paper as permanence. It's better characterized as paper as an alternate build target. The last couple big WordPress snafus have got me interested in static site generation as an alternative, so I see this as in that same line. Definitely worth a look.
posted by CBrachyrhynchos at 8:16 AM on February 24, 2016 [8 favorites]


OK, great, but putting aside your one anecdotal book, what percentage of works from 1507 still exist today in a state that is usable by anyone?

Impossible to know for certain, but I'd guess surprisingly high. (Pamphlets and such, not so much because they were recycled to wrap fish and such.) The real problem arises in the 19th century when the wood pulp became more acidic and the paper began to crumble after a few years. Sort of like silent movie stock, it burns in slow motion. Fast forward a few hundred years and we're talking a serious gap in any kind of material not deemed worthy or reprint or digitization.
posted by BWA at 8:18 AM on February 24, 2016 [5 favorites]


The core idea of Pollen is an argument:
Digital books should be the best books we’ve ever had. So far, they’re not even close.
Because digital books are software, an author shouldn’t think of a book as merely data. The book is a program.
The way we make digital books better than their predecessors is by exploiting this programmability.


That seems like a pretty weak argument to me. It's more like a chain of weakly related assertions. The first premise in particular sounds dubious. Why would being "digital" make books "the best we've ever had"? Because, yay digital?
posted by thelonius at 8:29 AM on February 24, 2016


Well, it depends. I'm doing deep reading of a couple of print texts with the problem of having to maintain bookmarks in three different places so that I can get to translation, original (give or take a few centuries), and commentary adds a fair bit of work. Usually I just think that digital books are different rather than "best." Partly I'm sympathetic because one of those texts was poorly converted from print to digital with a broken ToC and man-u-al-ly spec-i-fied hy-phen-at-ion rendered as I've just typed it, so better media-agnostic systems are welcome.

And since almost everyone except for maybe William Gibson and Lynda Barry starts in digital anyway, the print book is almost always a rendering of a digital book.

Part of my interest is that MS Word is broken in this area with the equivalent of <font> markup pretending to be structural <h1-4>. With part of my day job centered on cleaning up MS Word's shit in preparation for in-house, somewhat platform-agnostic publication, WYSIWYG creates as many problems as it solves.
posted by CBrachyrhynchos at 8:55 AM on February 24, 2016


I should say that one of those texts appears to have been poorly translated from a digital system targeting print publication to a digital system for handheld screens.
posted by CBrachyrhynchos at 8:57 AM on February 24, 2016


Matthew Butterick does great work, and Pollen is a fascinating project. Thanks for sharing it; I think it's something that's gonna help me out quite a bit.
posted by Ian A.T. at 9:01 AM on February 24, 2016 [1 favorite]


I do a webcomic, and I'm a fan of a lot of other webcomics, so I love the genre - even so, I find myself thinking that when I get to page x and have y number of readers, it'll be time to do a Kickstarter and try to get it put in print like Tom Siddell (who is double plus awesome 'cause he does his in hardback).

I can't really argue for the permanence of print (451 degrees and you're done), but as a lifelong lover of books, I find I'm moved by their tangibility. Not sure if that's a human thing or a generational thing.
posted by Mooski at 9:27 AM on February 24, 2016 [1 favorite]


So, OK, paper gives us --

* something you can read without a goddamned technology stack -- it's self-contained
* something that will survive an EMP or the digipocalypse or whatever, as long as the paper survives

Digital gives us:
* something that we can make a bazillion perfect copies of, so we can give something many many chances to survive, if we go to the trouble
* something that does not in and of itself decay; information is pristine forever; but it's very vulnerable to substrate decay and obsolescence

Paper or no paper, reading this, I was interested in the idea of web pages which are in some way self-contained and able to be copied and archived, as opposed to web pages which exist only as the manifestation of many moving parts working together in a particular way, e.g. a running CMS with the right version of the CMS running on the right version of PHP and the right database and javascript so you can parallax scroll it and so that some content only shows up after the ads load, yadda yadda.

The idea of still having a copy of a web page I care about, and having that copy to carry around with me, is attractive, even if it's not paper.

I'm kind of getting interested in turn-a-web-page-into-an-ebook-for-your-kindle type technology because of things like this.

It seems like even if we don't got paperless we can get less ephemeral if we want to.
posted by edheil at 10:06 AM on February 24, 2016 [3 favorites]


Part of the divide is when a book was printed: Books of a certain age will last quite well if they are kept dry, since they were made on acid-free paper. Books from the industrial revolution will not last, since they were printed on acid paper that is eating the pages from the inside out. Books today are a mixed bag, but as I understand it they are mostly printed on non-acid paper, particularly by academic presses (a number of academic books I have state they are printed on acid-free archival paper). I'm betting a lot of POD setups use acid paper though, to keep costs down, so I'd investigate the paper type before you try using that route to preserve things.

The other problem I see with internet based archiving is what happens when you die and stop paying the bills?
posted by Canageek at 10:06 AM on February 24, 2016 [1 favorite]


I think the "lasting" part is only somewhat concerned with the durability of paper and more to do with the entire ontology of "webness". I went looking for a website that captured my interest a few years ago and it was gone, with a few drips and dabs remaining on the Internet Archive. I don't have an original copy of the King James version of the bible, but I have a 6 dollar paperback copy, and if I spilled coffee on it it would only take a few dollars more to replace it. For books with less a long tail than the bible, I'm pretty sure enough demand could bring anything once in print back.
posted by Chitownfats at 10:33 AM on February 24, 2016 [2 favorites]


Two other relevant links:

Wikipedia Historiography of the Iraq War -- sometimes what's relevant about print is that it is relatively static.

Mefi2Book
posted by tofu_crouton at 10:37 AM on February 24, 2016 [1 favorite]


make sure and bury something that reads SD cards, too.....and whatever infrastructure that device needs to work

SD cards are solid state, and support for basic 4-wire serial is mandatory. As long as we have electrons, and the card hasn't fallen apart for other reasons, reading it is trivial.
posted by effbot at 11:43 AM on February 24, 2016


"I believe it because it’s absurd: the entire Britannica - not to mention the stacks of my old branch as well as the entire Library of Congress - can in theory be encoded by a single notch on a rod… Any text, however long and complex, is a linear stream of characters. Letters, punctuation marks, typographic symbols: fewer than a hundred types. Each of these hundred can be replaced by a unique three digit number… Now the remarkable twist. If I run the triplets together and put a decimal point in front of the number, the result is a rational fraction running to millions of decimal places. But a fraction represents, and is represented by, a portion of any distance, say from one end of a stick to the other… I can incorporate my box of sayings in a discrete point a little farther than 13/100 of the way from tip to base. I might look at the mark from time to time, delighting in knowing that it encased the cycle of every famous saying ever to flesh out my calendar."

-Richard Powers, The Gold Bug Variations
posted by superelastic at 11:44 AM on February 24, 2016 [3 favorites]


"Moving parts," is an important consideration. Linux Mint was hacked over the weekend, apparently because breaking one set of moving parts (the bbs) allowed the attacker to get into another set of moving parts (the CMS). As much as some people are putting the blame on the fact that Mint is a popular distribution run by a small group, the Gawker hack happened almost the same way.

Static site generators help to mitigate that risk a bit. And CMS systems themselves are overkill for writers looking for a way to put a few hundred thousand words up on the web.
posted by CBrachyrhynchos at 11:48 AM on February 24, 2016 [1 favorite]


I enjoyed the article not really because of the argument about printing out web pages as books, but because of the use of Pollen. I have played with Pollen but I only got so far because I couldn't really understand why one would use it over anything else.

If I wanted a markup language, I would use markdown (or even just write the HTML).
If I wanted a static site generator, I would use something popular, well-documented and supported like Jekyll.
If I wanted a really pretty site, there are nice CSS templates out there (like Tufte CSS, as used in the artcile) or web frameworks (like Bootstrap) which would do 99% of the work.

The article emphasises a different way of looking at Pollen, which piques my interest. As someone with fond memories of LaTeX (despite its pains), projects like Butterick's Practical Typography and Pollen are things that will catch my attention.
posted by milkb0at at 2:09 PM on February 24, 2016 [3 favorites]


> How much paper would it take to legibly print out everything that has ever been on the internet?

Here's one esimate of 1.36x1011 pages.

I suspect that's undershooting a bit, but it's a difficult thing to estimate. I found some numbers extrapolating from the size of google's index, but it turns out even that number is based on statistical extrapolation from search results and word frequency. If we're placing bets, I'd say 1015 pages.
posted by lucidium at 5:07 PM on February 24, 2016 [1 favorite]


There was that bit of news about nano "5d" glass storage a few days ago with a ridiculous number of years of life attached.

Right. This thing.

13.8 billion years @ < 350F. (if we can maintain the machines for reading it that long, i guess)
posted by Sallysings at 6:20 PM on February 24, 2016


Having lost some good web pages that are incompletely archived at archive.org and trying to figure out how to recover what I can and also worried about the life of another site that blocks archive.org and who's owner is mostly absent so we're not sure how long it will last. A third lost site came up in a discussion with a friend about lost web data, and archive.org only had about 3000 cached copies and the forum container 80,000 posts.

And these are just the sites I care about because they were important resources to me. There are so many more that have vanished without a trace.

Count me in the web pages are not nearly as long lasting as books camp.
posted by [insert clever name here] at 1:29 AM on February 25, 2016 [2 favorites]


Pollenpub.com is now redirecting to the pollen docs on racket.org. Did we stress it out?
posted by edheil at 11:39 AM on February 25, 2016


er, racket-lang.org
posted by edheil at 11:45 AM on February 25, 2016


In the '80s they transferred the Domesday Book to optical disk to preserve it forever, paper being so fragile and all.

Ha ha near-instant digital obsolescence except no, that wasn't actually what the Domesday Project was about.
posted by holgate at 1:30 PM on February 27, 2016


A little late to the party on this one, but this looks pretty cool. There are arguments for both print and digital - both do things the other can't, but the idea is interesting. Especially if your site is reference-oriented, like the ones I asked about in AskMe. I would probably buy a site book from some of those.
posted by kevinbelt at 12:38 PM on March 2, 2016


« Older The Truth About the MiG-29   |   Shrewsbury clock: A portmanteau Newer »


This thread has been archived and is closed to new comments