"By this art you may contemplate the variation of the 23 letters."
May 24, 2015 2:19 PM   Subscribe

http://libraryofbabel.info/
The Library of Babel is a place for scholars to do research, for artists and writers to seek inspiration, for anyone with curiosity or a sense of humor to reflect on the weirdness of existence - in short, it’s just like any other library. If completed, it would contain every possible combination of 1,312,000 characters, including lower case letters, space, comma, and period. Thus, it would contain every book that ever has been written, and every book that ever could be - including every play, every song, every scientific paper, every legal decision, every constitution, every piece of scripture, and so on. At present it contains all possible pages of 3200 characters, about 104677 books. posted by andoatnp (58 comments total) 60 users marked this as a favorite
 
Every possible short-short story, then, along with every possible Interoffice Memo.
posted by Mogur at 2:47 PM on May 24, 2015


Type in something your love said to you once, in the strictest confidence, which you swore never to reveal. And yet, there it is.
posted by Mogur at 2:49 PM on May 24, 2015


I was ready to call BS due to the impossible storage requirements, but this person is smarter than me:
The site doesn't store books on disk, and it doesn't create them as they're requested then store those pages. But, it does always place the same page of text at the same "location" in the library.
It does this by using a pseudo-random number generating algorithm called a linear congruential generator. In order to be able to produce every possible page of 3200 characters, the PRNG requires a seed of about 16000 bits - in base ten, that's a number with ~5000 digits!
When you request a page, the CGI does the following calculations:
1)book location -> base ten random seed 2) random seed -> output of PRNG 3) output of PRNG -> page of text
The search function inverts each of these calculations:
1) page of text -> base ten output of PRNG 2) output of PRNG -> random seed 3) random seed -> book "location"

posted by double block and bleed at 2:53 PM on May 24, 2015 [7 favorites]


I'm surprised it took so long to automate the Infinite Monkeys.
posted by oneswellfoop at 2:55 PM on May 24, 2015 [6 favorites]


Your name and the details of the most terrible and wonderful things you ever did are in there. So are the details of terrible and wonderful things you never did.
posted by double block and bleed at 2:57 PM on May 24, 2015


Can someone explain this to me
posted by bleep at 3:01 PM on May 24, 2015


There's a full explanation in the library itself.
posted by Segundus at 3:08 PM on May 24, 2015 [35 favorites]


bleep: "Can someone explain this to me"

Every possible book of a certain number of pages, the vast majority of which are random gibberish. Since it's every possible combination, everything that can be written in a book of that size is there. Shakespeare's plays, the story of your life, tomorrow's winning lottery numbers. But it's such a huge collection of information that finding things that are coherent is essentially impossible. It would be impossible to store all of that information because it would take up the entire universe and then some. This person seems to have figured out a clever way to get around that limitation.
posted by double block and bleed at 3:11 PM on May 24, 2015


At some point I'd like to dig into how exactly this was implemented...I suspect it would take a fair amount of effort to understand, at best, and more likely wind up being completely over my head. I mean, "click here to generate a book full of random gibberish" is something I could write myself. But the *search* function...that blows my mind.
posted by uosuaq at 3:16 PM on May 24, 2015


A title and a page number turns into a 3200-character tweet. And it's already there, waiting for you to tweet it.
posted by rifflesby at 3:23 PM on May 24, 2015 [5 favorites]


One of my favorite things about this concept is that there's a kind of symmetry about the search for coherence in randomness, in that you can search the library for a book you can read and will tell you all the secrets of the universe and the library, but also for every randomly generated text one could perhaps concoct an elaborate, unlikely language which would make that text into a revelatory explanation of all things and complete index of the library. So every gibberish book contains all books, including non-gibberish ones, because there are as many potential codes and languages to interpret it as there are possible texts -- if not many, many more.
posted by Rinku at 3:24 PM on May 24, 2015 [4 favorites]


1) Create algorithmically generated library containing near-infinite combinations of text, then make it available via a website.

2) Wait for the next literary blockbuster to sell 10 million copies, perform a search using a unique and finely crafted paragraph contained within, then file a lawsuit for plagiarism.

3) ? ? ?

4) Profit!!!
posted by 1367 at 3:34 PM on May 24, 2015 [5 favorites]


(Wow. I just received a takedown notice* for my above comment... it seems it already existed in the Babel Library, and therefore is a copyright violation. Bummer!)

* Not really...
posted by 1367 at 3:37 PM on May 24, 2015 [3 favorites]


The Library of Babel would make a brilliant sci-fi tv series in the spirit of The Prisoner.
posted by Foci for Analysis at 3:39 PM on May 24, 2015 [3 favorites]


The book wrote all the comments in this thread before we did.
posted by andoatnp at 3:41 PM on May 24, 2015 [9 favorites]


The clever bit here is not that the library "contains" all these texts. It contains no texts at all.

The clever bit is that it has a reversible index. If you generate a random page, the index is the URL, which contains a 2000-character seed. If you ask for that URL again, you will get the same page.

Even neater, you can give a text and, by running the reverse version of the generation program, find out the index (URL) that generates it.

That is: there is no actual library content, and there is no search function. But the reversible index gives you the same results as if they really existed.
posted by zompist at 3:42 PM on May 24, 2015 [22 favorites]


If you're confused about this, it's all explained at the start of the complete and completely accurate index to the library that you can find in the library itself. Careful, though, don't pick up any of the spurious indexes that are to some degree or another fake that are also scattered throughout the library.
posted by You Can't Tip a Buick at 3:45 PM on May 24, 2015 [7 favorites]


So the answer is no, got it.
posted by bleep at 3:46 PM on May 24, 2015 [1 favorite]


actually the answer is yes. always yes. too much yes to be meaningful.
posted by You Can't Tip a Buick at 3:47 PM on May 24, 2015


A title and a page number turns into a 3200-character tweet. And it's already there, waiting for you to tweet it.

Not just that, but seems like it would be great for secret messages. Or for keeping a private blog or diary without storing anything beyond the link, or the book title with page number.

There was an interesting post on Hacker News about the Library. The creator shows up to answer questions (as jonotrain).
posted by honestcoyote at 3:48 PM on May 24, 2015 [3 favorites]


There is a related Twitter account: Permudat

It is tweeting all possible strings up to 140 characters. I think.
posted by chavenet at 3:54 PM on May 24, 2015


The "problem" with this library is that the index is arbitrary. It's not an alphabetical index, or by subject, or author, or chronological, or similarity/distance between two texts. It's a pseudorandom enumeration.

A nonrandom enumeration is trivially reversible: for example, the library of all 3200-character long texts, arranged in alphabetical order. Even this library, without the pseudorandom stuff, could have some neat uses.
posted by polymodus at 4:16 PM on May 24, 2015


Strictly speaking, only books written in languages that have (at a minimum) a transliteration into latin characters.

Not just that, but seems like it would be great for secret messages. Or for keeping a private blog or diary without storing anything beyond the link, or the book title with page number.

It isn't really. You need to store the entire random seed (location, volume, shelf, page) to be able to retrieve a page. That random seed is roughly as long as the page itself. You can "bookmark" pages, but that means whoever runs the site is storing the seed for you, so they can read every page you bookmark.
posted by BungaDunga at 4:17 PM on May 24, 2015




So when I search on phrases, I find them, but then when I view the page it only shows me that one sentence. How do I read the whole page? Yes, I know it's likely all gibberish.

Also, I would love to see someone put together something (is here an API?) That could look for whole pages/short stories. I'm imagining it would search one sentence, create a data base of matches. Search the next sentence, keep the overlapping matches. Search the next sentence, keep the overlapping etc. Once you had every sentence it would be a matter of finding the one with the phrases in the right order. Presumably by the same method, but searching for the words at the end of one sentence and beginning of the next.

Also, it would be great to be able to limit the search somehow to "only pages where everything is a word." THat might be too computationally difficult, though.
posted by If only I had a penguin... at 4:19 PM on May 24, 2015


Retrieving a particular page without a bookmark is a very long link

That's what I was wondering about when I saw this yesterday--the intuition is that random data is basically incompressible. That would explain why the links/references are also very long [thus limiting the ways we would want to use it].
posted by polymodus at 4:21 PM on May 24, 2015


so does this mean they're working on a new Myst game or something then
posted by DoctorFedora at 4:24 PM on May 24, 2015 [1 favorite]


So when I search on phrases, I find them, but then when I view the page it only shows me that one sentence. How do I read the whole page? Yes, I know it's likely all gibberish... Also, it would be great to be able to limit the search somehow to "only pages where everything is a word."

The search results page has four different kinds of results. The first group of results are the search phrase only. The second set is the phrase with random characters. The third set is with random English words.

Is that what you are asking to see?
posted by andoatnp at 4:27 PM on May 24, 2015


According to algorithmic information theory, most links will be as long or longer than the text itself. and since the links themselves are in the library, we may wonder about the sublibrary which contains all those texts which don't link to themselves. Russell's library.
posted by Obscure Reference at 4:28 PM on May 24, 2015 [5 favorites]


If you didn't do a pseudorandom shuffle, and just went "a", "b"... "z" "aa", "ab" then the "page number" would be really obviously encoding the page contents. Page 26 is a single z, page 27 is aa, page 52 is zz and so on: the page contents would just be a base-26 encoding of the page number.

The pseudorandom shuffle is a lot more fun though.
posted by BungaDunga at 4:32 PM on May 24, 2015


so does this mean they're working on a new Myst game or something then

They'd already be done, if only they knew which pages the code was on.
posted by rifflesby at 4:34 PM on May 24, 2015 [3 favorites]


The original source.
posted by Wet Spot at 4:46 PM on May 24, 2015




I remember seeing a similar Library of Babel on the Hyperdiscordia in the late '90s. It used to have similar library browsing functionality but that seems to have been lost over the years, with only the text remaining. I wonder if this new one is by the same person or if they just both like Borges.
posted by foobaz at 4:48 PM on May 24, 2015 [1 favorite]


Doesn't work without gangs of monks perusing the stacks, tossing books of gibberish (or what they think is gibberish?!) into the abyss.
posted by supercres at 4:52 PM on May 24, 2015 [1 favorite]


Russell's library

It won't exist. If you try to construct it, you will eventually get stuck with an incomplete sublibrary. And the reason goes back to the compressibility of random information.

This is a neat experiment, because this website provides a finite case of the paradox and an operational definition of what is a link—just take a substring and see if it's a valid book URL. In contrast, with the original Russell paradox and related paradoxes my impression is that there are also subtle mathematical issues having to do infinite sets and definable references.
posted by polymodus at 5:04 PM on May 24, 2015






Those infinite monkeys enjoy their ascii art too

Metafilter: macrostructure labdas synthetiser bawdries
posted by solarion at 5:16 PM on May 24, 2015


I searched for "shitblasting quakers" and found "everlastingly vacant moorcocks" on the same page. So that's my next mefi account name taken care of.
posted by infinitewindow at 5:17 PM on May 24, 2015 [6 favorites]


If you didn't do a pseudorandom shuffle, and just went "a", "b"... "z" "aa", "ab" then the "page number" would be really obviously encoding the page contents. Page 26 is a single z, page 27 is aa, page 52 is zz and so on: the page contents would just be a base-26 encoding of the page number.

That's the rub isn't it: ultimately the set of all strings of bit length k is not a terribly interesting set, anymore than the set of all natural numbers less than 2^k is an interesting set. It's just because most people aren't used to thinking of strings as numbers that it seems like something more meaningful than counting up is going on.
posted by Pyry at 5:23 PM on May 24, 2015 [3 favorites]


Title: x,kgomdpps Page: 262
posted by Doroteo Arango II at 5:41 PM on May 24, 2015


It's just because most people aren't used to thinking of strings as numbers that it seems like something more meaningful than counting up is going on.

But to be fair, it's a really cute implementation of counting.
posted by PMdixon at 5:48 PM on May 24, 2015


If someone starts spidering the site, does that invoke copyright on each subsequent work?
posted by ryoshu at 6:42 PM on May 24, 2015 [1 favorite]


What the hell do you think you're doing, posting all the dox of everybody in the world? That's completely unacceptable!
posted by Pope Guilty at 7:26 PM on May 24, 2015 [2 favorites]


Doroteo Arango II: There are 29^5 (20511149) possible books for every title. It's possible to grab full reference numbers from the pages using right-click -> view page source. DO this on a search page, copy the long string after a 'Location:', and don't forget the w, s and v numbers. THen you can return to that book via the Browse option.
posted by BiggerJ at 7:40 PM on May 24, 2015


Oh, and when browsing, after you've entered a hex's name, press return while the cursor is visible in the text box to go there.
posted by BiggerJ at 7:41 PM on May 24, 2015






That's what I was wondering about when I saw this yesterday--the intuition is that random data is basically incompressible. That would explain why the links/references are also very long [thus limiting the ways we would want to use it].

Yeah, non-compressibility is an intrinsic trait of random information as we understand it now, and compression adds information (metadata, trees, "lists of abbreviations and what they mean," etc) in the process of finding a repeatable underlying structure to the information you're trying to compress, and truly random data has no structure, so you essentially need an algorithm that knows when to give up (there's no way I can compress this random data, all I'm going to do is make a bigger chunk of data) or you need a compression algorithm that deals in random number generation itself. Luckily we don't deal with that much random information that is all that useful to us. Most of the information that we value can be described with blocks of non-random information like functions, words, sentences, and there are many ways to represent this information with smaller bits of information (like a simple Algebra function lets you describe a line more efficiently and accurately than storing 1,000 pixels' worth of data to render that line).

As a young teenager I fantasized about discovering a PRNG compression algorithm -- this "seed" (a-hyuck) of an idea that you could come up with a small block of numbers that yields the complete works of Shakespeare. For those not familiar with pseudo-random number generators, the idea is that if you pick a number "between 1 and X" (and these are arbitrary boundaries" you can feed that number into a "generator" and every time you "feed" that number into a generator and say "give me 1,000 random numbers" it will spit out the exact same sequence of 1,000 numbers every time based on your initial seed. So to simulate randomness, your seed itself must come from a very random-like source (like timing "random key strokes" in certain encryption software to create a key).

You need a smart algorithm that realizes maybe "2891ha2" is only going to get 50 characters that you need before it starts giving you gibberish. So now your algorithm has to say "generate this many permutations of pseudo-random numbers from this seed before stopping, and then find another seed." The problem is you need enormous seeds, likely to be just as big if not bigger than your data set, in much the same way that you can't just take an MD5 hash and generate the original data set, or you need near infinite time to calculate the seeds because you essentially need to procedurally generate all of the data you need to know if the seed is even usable.

But I thought you would solve all of that by doing something "simple," like "I'll come up with small seeds that generate small blobs of information, and then re-compress all of my compressed data over and over again!"

If you want a single pseudo-random-number generator seed that can be used to generate a specific block of text, the seed is probably going to be huge because it's . It's easy to make all of the pseudo-digits you want, but if it's pi you're looking for, it's going to take you awhile to find it.
posted by aydeejones at 8:21 PM on May 24, 2015 [2 favorites]


Reader, I was no mathematician, but it was a good lesson in why you can't just out-think math. When I would try to explain it fellow nerds who might already be on a CS path, they could see why it was laughable and impossible, but everyone kind of wanted to break their brain thinking it through, like "middle out compression" in Silicon Valley. Of course you can't just keep re-compressing the same data over and over again, because compressed data itself looks like random data, and a compression algorithm is only as efficient as how useless the output is to another compression algorithm :D
posted by aydeejones at 8:25 PM on May 24, 2015 [2 favorites]


"What Colour are your bits?" should probably be dragged up once again; it seems relevant here.
posted by vogon_poet at 8:31 PM on May 24, 2015 [1 favorite]


That's the rub isn't it: ultimately the set of all strings of bit length k is not a terribly interesting set, anymore than the set of all natural numbers less than 2^k is an interesting set. It's just because most people aren't used to thinking of strings as numbers that it seems like something more meaningful than counting up is going on.

I think this is one of those things where you can say "it's trivial!", intellectually, but the actual reality is a little more unsettling.
posted by vogon_poet at 8:35 PM on May 24, 2015


I'm just going to put all my secrets in there and see if it has any insights. LOL NSA
posted by aydeejones at 8:38 PM on May 24, 2015


nubs: "It was the best of times, it was the blurst of times."

You know when I saw that episode I thought Monty Burns was being a bit too harsh on that monkey, but now that I know it takes a dadaist turn after that with the challenging phrase "ydyzk.zlejlbjl,,xzlih,dd,llgohlbus bj.kich,mpdsumopqhjxlgdmfbbkljegwtawcrjuibzyyolym.
ltsbyuaqffcqgfemhxyysemgcemzqqhqvjvs mk ." I have to agree: that's dreck.
posted by traveler_ at 9:08 PM on May 24, 2015


I have to agree: that's dreck.

It suffers in translation.
posted by nubs at 9:14 PM on May 24, 2015


This would be an excellent data gathering tool.

As for me, next time a client asks if the DB is backed up, I'll tell 'em it's in the babel cloud.
posted by maxwelton at 12:10 AM on May 25, 2015 [2 favorites]


The librarian stepped into the hexagon that, according to its inscrutable, untarnishable plaque, was proudly named (pastebinned for length) and, perhaps out of some misguided belief that he would be genuinely thorough in his examination of it, walked over to the first shelf on its first wall. It was then that his resolve vanished, and so he took the book third from the right and began to page through it.

Page one. Did the order in which he read the books matter? Within a shelf, a wall, a hexagon, or the entire library for that matter? Page two. There was the mythical Hexagon 0, of course - or Hexagon a. 0 was, as far as he was aware, currently the most popular theory. In the past, people had been killed over the debate. And for what? Page three. What made the First Hexagon, with its storied-yet-simple plaque, so special? Page four. For was each book - and indeed, each page, and each individual character - not the result of the roll of some metaphorical cosmic die? Page five. What meaning had they ever had besides that which the librarians had given it? Page six. There were those librarians who had supposedly found words of decent length - and, so it was said, even sentences and fragments thereof, and there would always be even more fanciful legends of those who'd found entirely coherent pages, volumes or even hexagons - and, thanks to their language, one completely arbitrary under the metaphorical gaze of the Library, those would be the volumes, and the people, worthy of note.

Page seven.

After an uncounted amount of time, he closed the book, looked around furtively for other librarians (whom he was thankful to not find), returned the volume to its place among the equally uncounted myriads and briskly walked away, deliberately complicating his path so that he could not hope to retrace it.
posted by BiggerJ at 1:33 AM on May 25, 2015 [3 favorites]


« Older Steven Gerrard's Final Game in a Liverpool Shirt   |   "In 22 seconds, he dribbled 57 times." Newer »


This thread has been archived and is closed to new comments