The soul of a library is something really complex
December 20, 2023 2:15 AM   Subscribe

Here in the Manuscripts Room, the space itself looks the same, but it does not sound the same; depopulated, it is oddly quiet. Loudly quiet! This quiet is completely different from the constant rustle of ambient noise that counts as what we could call “library quiet.” Today, the distinctive energy of the Manuscripts Room is nowhere to be found: on a typical day, staff and readers alike are focused, on the clock, working swiftly and deeply, using fragile materials that are, by definition, unique and irreplaceable. This distinctive energy is the product of a thrilling alchemy of two forms of raw materials: readers, and the works in their hands. Absent readers, absent works, the reading room is just a room. from How to Lose a Library posted by chavenet (22 comments total) 27 users marked this as a favorite
 
I was just about to post an AskMe about this. The BL's online services have been out of action for nearly two months, and the Library is now saying that it could take from 'several months' to 'over 12 months' to restore operations. Can anyone help me understand why it's taking so long to get the online catalogue up and running again?

Another question for the tech-savvy among you. Rhysida have dumped a huge cache of data on their website, which is easy to find via Tor, and as a former BL employee I'd like to know if any of my personal data is there. Am I taking any risk in clicking the download link on the Rhysida website? Part of me thinks this is a really bad idea (downloading an unknown file from a group of Russian hackers on the dark web .. what could possibly go wrong?) but I'd also really like to know if any of my data has been compromised.

From the New Yorker: The Disturbing Impact of the Cyberattack at the British Library (archive.org link).
posted by verstegan at 2:34 AM on December 20, 2023 [5 favorites]


Mod note: Removed one comment that was probably meant for the Miss Universe thread.
posted by Brandon Blatcher (staff) at 5:01 AM on December 20, 2023 [3 favorites]


As someone who works for a library system that I would imagine is also an attractive target for cybercriminals, seeing the damage this has wrought on BL has certainly cured any grumbling I might have had about the security hoops we have to jump through at work. I know it's hard both for security and reputational reasons for a big institution that gets hacked to talk about what made this possible, and of such severity (nor can it be their top priority right now) but I hope there's a chance for the community of IT folks to learn from this disaster.

Fuck Rhysida.
posted by Horace Rumpole at 6:41 AM on December 20, 2023 [5 favorites]


Can anyone help me understand why it's taking so long to get the online catalogue up and running again?
The British Library holds more than 170 million items, including over 13 million printed and electronic books as well as hundreds of thousands of periodicals, microfilms, and rare manuscripts
They appear to have had an extremely spotty backup system and if they even have a backup of the full catalog it is significantly out of date. I suspect they’re pulling together a mishmash of old data that happened to be laying around, and the first pass will be whatever they can weld together from it.

Honestly, when you’ve lost that much data a few months to be functional and twelve months to be back on your feet is pretty impressive.
posted by Tell Me No Lies at 7:22 AM on December 20, 2023 [3 favorites]


Am I taking any risk in clicking the download link on the Rhysida website? Part of me thinks this is a really bad idea

Trust your instincts.
posted by Tell Me No Lies at 7:26 AM on December 20, 2023 [4 favorites]


as a former BL employee I'd like to know if any of my personal data is there.

From their temporary front page, just in case you hadn’t seen it….
Q: I'm a former member of staff and worried about my personal data - what should I do?
A: Please email customer@bl.uk and we'll come back to you as soon as we can
posted by Tell Me No Lies at 7:42 AM on December 20, 2023 [2 favorites]


Honestly, when you’ve lost that much data a few months to be functional and twelve months to be back on your feet is pretty impressive.

This. So impressive that I'm kind of wondering if they're going to pay the ransom and the wait is for authorization of funds.

The British Library is HUGE, larger than the Library of Congress even. Their print collection alone is larger than the entire collections at many R1 university libraries, and they have a bespoke classification system for it, which means they can't just download new records from a cooperative catalog like many libraries could do. And they have SO MANY digital collections of rare items to boot. They must have fairly complete backups that they're restoring from, because having to recreate all their data from scratch would be literally impossible, unless they had HUNDREDS of people working around the clock for a full year (and at that point, they may as well just pay the ransom). Even if they have everything backed up, those backups likely need a lot of manual evaluation and massaging to get them usable, which is why even being partially functional after only a couple of months is hella impressive.

To illustrate the scope of this, I've dealt with data loss for just one part of one collection and it took more time to restore than what the BL is saying it will take to fully restore their whole catalog. Which is why I'm saying they've either got everything backed up or they're gonna pay the ransom.

(I am a systems and cataloging librarian but I am not your systems and cataloging librarian.)
posted by rabbitrabbit at 8:03 AM on December 20, 2023 [17 favorites]


I suppose that, in the land of the Brexit, it should come as no surprise that one of the jewels of the common culture of humankind didn't have backups, or a disaster-recovery plan.
posted by Aardvark Cheeselog at 8:09 AM on December 20, 2023 [3 favorites]


The hackers may have overplayed their hand here. If they had restricted themselves to just the HR or customer databases then BL might be willing to take the risk of paying for the data and believing that they could secure that system. Knowing the full breadth of the infection it seems like the only responsible solution is to burn their entire digital infrastructure to the ground and start over.
posted by Tell Me No Lies at 8:10 AM on December 20, 2023 [1 favorite]


They appear to have had an extremely spotty backup system and if they even have a backup of the full catalog it is significantly out of date

Yes, that's my impression too, and that's what was behind my question. Are they taking a long time to restore their systems because they're treating it as a crime scene and doing some slow and painstaking digital forensics? Or is it because they've lost crucial data and don't have a backup?

The Library says that 'the vast datasets held in our Digital Library System, including the digital legal deposit content that it is our statutory duty to collect and preserve, are intact and safe from harm'. Which is, obviously, good news, but they don't say anything about the data held outside the DLS, which suggests that some of it may have been permanently lost. (It seems extraordinary to me that they wouldn't have a full backup of the catalogue in case of data loss, but what do I know?)

From their temporary front page, just in case you hadn’t seen it

Yes, I had seen that, but I assume that 'we'll come back to you as soon as we can' means they have no useful information or advice other than 'er, change your passwords'. Checking the Rhysida data dump for the personal details of former staff members is, understandably, going to be very low on their priority list at present.

Meanwhile, in other news: 'The Ministry of Justice is consulting on digitising and then throwing away about 100m paper originals of the last wills and testaments of British people dating back more than 150 years'. Because digital copies are safe for ever, right?
posted by verstegan at 8:17 AM on December 20, 2023 [6 favorites]


Toronto Public Library and London-not-England-Ontario Public Library have both had cyberattacks in the last little while.

This is an interesting article about some of the whys and hows. Something I didn't realize is that the organization doesn't always know how the attackers got in because it is a lot of money to have a forensic IT team investigate.
posted by eekernohan at 8:20 AM on December 20, 2023 [1 favorite]


I've been following along with the updates, and reading between the lines, they don't appear to have held onto their pre-electronic paging methods for manuscript material. (My medievalist colleague was planning on a trip there early next year to finish up a project, and I'm guessing that he is now not doing that.) My first go at the British Library was in the mid-90s, when it was still in the British Museum, and everything had to be looked up in bound catalogs and paged on slips; it sounds like you can still do that for books, to a limited extent, but the manuscripts are all FUBAR.

My own little SUNY campus is now prioritizing electronic purchases over hardcopy, and I pointed to this disaster as exactly one of the reasons this is Not a Good Idea. (That's before we get to expired subscriptions that eliminate your book holdings...)
posted by thomas j wise at 8:44 AM on December 20, 2023 [3 favorites]


Are they taking a long time to restore their systems because they're treating it as a crime scene and doing some slow and painstaking digital forensics? Or is it because they've lost crucial data and don't have a backup?

As with most ransomware attacks there are really two things being held hostage -- a big chunk of data (passwords, billing info, book catalog, etc) and the digital infrastructure itself. It's pretty easy to detect when you can't access your data, but the infrastructure -- every PC, every network device, everything attached to your internal network is now a potential Trojan Horse. If your most trusted devices have been compromised to this extent then it is a good assumption that everything else has too. For reference, the British Library has about 1300 employees, which would translate to ... how many desktops and laptops?

Every single device that has been attached to your internal network needs to be inspected and thoroughly cleansed before it can be reattached. Letting even one infected machine back in is a problem -- and remember that you're trying to cleanse a virus that got through your security the first time. The only secure fix is a complete wipe and reinstallation from trusted sources.

So I imagine it's a combination of things. They don't want to wipe computers that may still yield information on the attack, and at the same time bringing up new computers with all the right software and security setup is taking time. Hopefully their bespoke software has clean copies at the developer sites.

At the same time it sounds like what backups they have are fragmentary and dated, so there's a lot of work to be done bringing them into a useful form.

From the sound of it a lot of people aren't going to have Christmas with their families this year.
posted by Tell Me No Lies at 8:59 AM on December 20, 2023 [3 favorites]


If it happened at my library, we'd have access to the microfiche of the card catalog that was made before it was discarded ca. 1995, and there are some areas where you could with some effort reverse engineer the classification from the specifics of the book. But most more recent acquisitions would be very difficult, and anything sent to offsite storage would be unfindable--it's shelved by size for maximum efficiency, and the electronic records of where anything is are the only ones.

From time to time I'll still see a little pile of the two-part handwritten paging slips we stopped using in 2010 when we got Aeon for circulation, but they wouldn't last long.
posted by Horace Rumpole at 9:04 AM on December 20, 2023 [4 favorites]


I suspect-without-proof a few non-obvious factors behind the delay:

* In-house bespoke software. At the BL's scale, a lot of library-vendor and open-source stuff simply won't fly. So the BL can't dump a lot of its problems on a DFIR consultant or vendor or whatever; they've got to audit a crapton of in-house code for vulnerabilities or signs of attack.

* Similar to the above, experimental software and services. The BL, as one of the world's foremost research libraries, does research-and-development work. That means newly-created software, and newly-created software often means bugs, including vulnerabilities.

* It's ridiculously difficult -- anywhere -- to get sufficient funding to sustain responsible digital records-management and digital preservation practices. The Blue Ribbon Task Force report from two-thousand-freakin'-EIGHT spells it out beautifully. Nothing's changed. (Except, as verstegan notes above, that some yahoos don't even want to fund analog preservation, thinking digitization plus digital preservation cheaper. Those yahoos are utterly deluded, but they're legion. Ask your favorite Canadian researcher about the damage Daniel "digitize everything!" Caron did to Library and Archives Canada.)

* There's a long history of librarian resistance to computer-based technologies. (Don't argue with me about this or I will be forced to point you to librarian crapweasels Ellsworth Mason, Michael Gorman, or Jeffrey Beall... also non-librarian crapweasel James Billington, who ran the Library of Congress damn near into the ground, technologically.) The more invested a place and its people are in its analog collection -- and the BL is justly invested in its analog collection! -- the more resistance, commonly. This can lead to suboptimal digital practices, to put it mildly.
posted by humbug at 9:19 AM on December 20, 2023 [11 favorites]


It seems extraordinary to me that they wouldn't have a full backup

I once joined a small team at an Apple subsidiary. Because it began as just a few people doing a special project they never got around to plugging themselves into the company's full IT system. When I arrived the team had grown to ten experienced developers who had been working on a project for over a year with no backups.

The point being that even if there was a detailed recovery plan in place there may have been entire departments that were partially or fully non-compliant.

Also, the most commonly left out element of backup plans is regularly testing them. It is a very common story for systems to change and the backup system to be left behind. They could have been doing hourly backups of everything but if no one was testing the recovery process it could be for naught.
posted by Tell Me No Lies at 9:21 AM on December 20, 2023 [8 favorites]


Systems librarian turned municipal civil servant. I am responsible for only one application instead of four for the same salary, and there are three times as many staff supporting that one digital service.

Both my past and present institutions were still hit with a massive cyberattack this year.
posted by avocet at 11:41 AM on December 20, 2023 [2 favorites]


Honestly, when you’ve lost that much data a few months to be functional and twelve months to be back on your feet is pretty impressive.

The amount of data is actually pretty middling by modern standards. I would guess a few TB of metadata and then a few hundred TB for, e.g., scans of manuscripts and the like.

In the US, many financial institutions are subject to Financial Industry Regulatory Authority (FINRA) Rule 4370, which requires publication of their business continuity plans. Here's one for JP Morgan Securities [pdf], which notes
The recovery plans, policies, procedures, and practices address events ranging from small events to regional crises. Such events would include damage to or loss of single floors within our facilities, individual computer systems, entire facilities or data centers, and wide scale disruptions which affect both our staff and facilities/systems. JPMS will endeavor to continue business on behalf of its clients on that same business day during any and all events, recognizing that service may be impacted for longer periods depending upon the seriousness of the event.
(emphasis added). JPMS has ~7,600 employees (the BL has ~1500) and manages ~850,000 accounts (there were ~1.1 million visits to the BL in 2022). It is likely that their data is of a broadly similar scale to the British Library's.

Now, they're also working in a highly regulated industry and probably have a larger IT budget than the BL, but the point remains that there's nothing special about that amount of data that would necessitate a multi-month recovery. A properly designed and implemented disaster recovery plan (which would include cyberattacks) should take hours to days to play out, not months. I suspect the institutional and professional factors described by others above probably had more to do with it.
posted by jedicus at 11:41 AM on December 20, 2023 [1 favorite]


(At least I feel remotely on top of my testing and upgrades now.)
posted by avocet at 11:44 AM on December 20, 2023 [2 favorites]


Fingers x-ed they can get things back online. This wikipedia article indicates some degree of resiliency (& checking) but doesn't drill into the data vs the systems that deliver the data. I guess their team will need to perform some serious forensics to find out how/when they got in, plug the holes and then start restoring systems and functionality from a particular point in time. They may also need to purchase all new kit (depending on the level of trust in identifying and eliminating the holes). If an old PC under a desk is networked and still has remnants of the attackers code on it, ready to re-activate, then presumably there is a high-risk the scenario will play out again (& again). I don't envy their IT/Security team working through this (not to mention the people whose personal information was also compromised).
posted by phigmov at 12:59 PM on December 20, 2023 [2 favorites]


The amount of data is actually pretty middling by modern standards. I would guess a few TB of metadata and then a few hundred TB for, e.g., scans of manuscripts and the like.

We’ll definitely if they had a fully working recovery plan I would expect a quicker response, but in terms of a lesser system I suspect a library catalog is a much different beast than an archive of business transactions. Opening ten million accounts is a matter of having ten million clients filling out forms with information they already know; Creating 10 million library resource records has to be done one at a time by specialist.
posted by Tell Me No Lies at 1:34 PM on December 20, 2023 [2 favorites]


A few years ago the IT genius at my library decided that updating Drupal, which our site ran on, was not a priority. Then of course, we had a cyberattack and were locked out of our site. For a year. We ran the most important features on satellite servers and lived without. For a year you could go to our library's address and see a note from the hacker reminding you that there was a reason to follow security procedures and update your software. It was hilarious and depressing. Yes, It might take along time for the site to go back up.
posted by evilDoug at 4:17 PM on December 20, 2023 [2 favorites]


« Older The Closing of the Bulgarian Frontier   |   You don't have that kind of time. Newer »


This thread has been archived and is closed to new comments