patron records and circulation privacy in libraries
June 6, 2021 8:33 AM   Subscribe

Librarian and researcher Dorothea Salo teaches an information security and privacy class that "asks students to investigate various aspects of the privacy/security situation surrounding their choice of campus-related data." Based on what they dug up, Salo requested records of her own library usage data at the University of Wisconsin, and published the dataset. It's big and detailed, goes back to 2002, and violates traditional library-patron privacy expectations. Librarian Kendra K. Levine: "The circulation data should not exist. I know it’s valuable for collection assessment but to the level of granularity tied to an individual?" Salo wrote a follow-up to "give you some idea where to go looking if you’re curious about a library’s stated practice".
posted by brainwane (29 comments total) 46 users marked this as a favorite
The author tweets: "’s a fun thing you can try:

Use these records to make up a story about how I am the ABSOLUTE WORST.

Seriously, go ahead. That’s part of the point here."
posted by brainwane at 9:15 AM on June 6, 2021

Dorothea is a treasure. (she's also MeFi's Own but I'll let her come in here and say hi if she wants). Thanks for posting this, will dig in.
posted by jessamyn at 9:31 AM on June 6, 2021 [9 favorites]

I also work for the University of Wisconsin, and thus for the state of Wisconsin. It's a given here that any use we make of state resources (e.g. any email we send from a address) is a public record. I didn't know this applied to my EZProxy requests and checkouts, but now that I know, it makes sense. If students' library records are available too, that's a different story.
posted by escabeche at 9:51 AM on June 6, 2021 [3 favorites]

That's interesting, escabeche. I guess that I would draw a distinction between my expectation of privacy as a university employee using university resources, which is basically that I have no expectation of privacy, and my expectation as a library patron who happens to be using an academic library. I think those are kind of distinct. If I saw a doctor through an on-campus medical clinic, I would still expect to have medical privacy, because my right to medical privacy supersedes my lack of right to workplace privacy. This is, I think, about how libraries function, not about how academic workplaces function.
posted by ArbitraryAndCapricious at 10:35 AM on June 6, 2021 [11 favorites]

Dorothea (and her students) did us all a service by doing this work, along with writing up the "how to" information to do the same at other state institutions. While libraries' commitment to privacy is strained (to breaking point) when the databases they license rat you out to publishers, aggregators (e.g. Lexis/Nexis, which has ICE contracts, too) and to the data-companies-formerly-known-as-publishers, there is no legit reason libraries shouldn't protect privacy within any databases that are under their control - like catalogs. (Though even if you scrub patron records, the vendors of catalog systems gather data; it just would be harder for local law enforcement to subpoena records from a particular institution when they can walk into the door of a library and threaten people.)
posted by zenzenobia at 11:04 AM on June 6, 2021 [2 favorites]

Appropriately, one of the titles checked out recently is Habeas data: privacy vs the rise of surveillance tech...
posted by doctornemo at 11:57 AM on June 6, 2021

Hi, all -- yes, this is me and the mentioned students were in my class this spring.

A note on the sunshine-law request (which is something I should make clear on Tattle Tape, so thank you for that): this is only going to work if you're asking for YOUR OWN records. You can't ask for Random Person Who Is Not You's records, never mind everybody's records, because no sunshine law will allow that level of confidentiality violation. (The Patriot Act, on the other hand... *rimshot*)

The question of how long it's reasonable to keep identified circ records doesn't have a pat answer. I'll be interested to see the discussion, which is liable to get a bit heated! But I'm pretty sure that nearly twenty years is too long, even for special collections.

I have no idea why I checked out that Michael Moorcock book... and no memory of it. (To be clear, I don't think the records are in error! I just forgot.)
posted by humbug at 12:07 PM on June 6, 2021 [41 favorites]

But my work email CAN be asked for by any Random Person, right? (Not "give me every email this prof has ever sent" but "every email containing the following keywords and here's why I'm asking.") That's how I've always understood the rule. Thanks for the clarification; I didn't realize the circ records had a different status.
posted by escabeche at 12:16 PM on June 6, 2021

But my work email CAN be asked for by any Random Person, right? (Not "give me every email this prof has ever sent" but "every email containing the following keywords and here's why I'm asking.")

I'm both an employee of the state I live in and an elected official; I've been advised that the first type of broad request can and does happen by both my workplace and my city lawyer, and have seen it happen to colleagues and other elected officials.

They do get to filter out personal correspondence (and other things that are covered by, e.g., FERPA) but everything else is fair game. I believe this might vary by state; mine has an unusually broad right-to-know law.
posted by damayanti at 12:33 PM on June 6, 2021 [1 favorite]

But I'm pretty sure that nearly twenty years is too long, even for special collections.

How about 200 years? I'm not sorry the London Library kept its circulation records from the 1840s so we can see what Charles Darwin was reading.
posted by verstegan at 12:41 PM on June 6, 2021 [1 favorite]

escabeche: If it's a non-confidential work record and you work for a public institution, yes, it's fair game under most circumstances. (Records officers have some discretion to refuse vexatious requests.) Even so, though, there have been politically-motivated attack-dog requests on faculty email at my institution, and the records officers were careful to strip out email with attached confidentiality rights (e.g. email between instructor and students).

verstegan: I'm happy to give up historical interest when doing so comes alongside protecting library patrons' information and entertainment consumption from fascists, racists, law enforcement, many types of haters of specific religions, black-hat hackers, nosy family members, TERFs, homophobes, stalkers, rogue library insiders, rogue library content and software vendors (which is most of them, frankly), data brokers, misguided Big Data efforts, etc. Absolutely thrilled, in fact. Data minimization 5eva!

But yes, library privacy is a relatively recent development -- 1930s in the US, can't speak for elsewhere in the world (but check IFLA, they may know). If I didn't link to the excellent Witt article about this on Tattle Tape, let me link to an open-access copy here. The story of Henry Melnek is instructive.

I have a post marinating in the back of my head about library insiders as a specific threat. Those of you who know my far from pristine work history can guess some of what it'll say, but for now: no, I don't especially trust a library system with no privacy policy not to let its staffers run riot through the records.
posted by humbug at 12:58 PM on June 6, 2021 [20 favorites]

I have a post marinating in the back of my head about library insiders as a specific threat.

I would be really interested in reading that, I suspect that not only is it a largish issue but there are other "chilling effect" aspects like people not using their small town library because they don't at all trust the library's privacy policies because of this and don't want the librarian or their family to know their business. I've definitely worked in libraries with pretty good privacy policies but where the policy was only as good as its weakest link, in our case the circ staffer who would always comply with the cops' "Hey we found this wallet with a library card in it can you give us the name/address of the person it belongs to?" ruse (for non library people, the privacy-forward approach is "Oh thanks for the wallet, I'll call the patron and let them know you found it"). It's like we see so many people's attempts at hiding from, for example, abusive exes to only be as strong as the most easily manipulated person from the dmv (or other big govt org - strong argument for paying all of these public servants better)
posted by jessamyn at 1:15 PM on June 6, 2021 [17 favorites]

humbug: I shall attempt to fulfill your request to "Use these records to make up a story about how I am the ABSOLUTE WORST."

[not being serious]

Well, other than the fact that you have read Alinsky's "Rules for Radicals" which means you're clearly someone who should never be trusted with institutional power, you also only borrowed the book "Hidden Figures" in 2017, so, AFTER the movie came out! [insert insult here based on the idea that you are behind the times]

And I assume the fact that you borrowed "Monkey Island" (the video game?) in 2016 must nefariously tie together with your interests in William Morris, Puerto Rico, and primates. Clearly, you have a scheme to fill Puerto Rico with Arts and Crafts houses for capuchins. You must be stopped!

And -- worst of all -- you have not borrowed any books or articles in which I have authorship!

[now I am serious again]

humbug, thank you for being willing to share these records with us -- if it were me I think I might feel really vulnerable in a way that would give me pause before publishing.
posted by brainwane at 2:02 PM on June 6, 2021 [3 favorites]

It'd be nice to see some similarly thought provoking posts from Infrastructure staff responsible for backup/restore (and if you're lucky archive & purge) activity. Doesn't really matter what the data-base policies are if the data goes to tape (or similar) every day, month, year or longer. I guess there is an element of privacy through entropy on the back-end - orgs rarely keep the software (application, database, backup or OS) or tin (hardware or media-reader) required to restore from DAT or DLT or whatever other media data was archived to 10-15+ years ago. If the org has a solid retention/disposal cycle that permeates through to Infra, the old tapes might just get destroyed rather than sitting at the back of a shelf (...waiting for the day they get recalled because someone wants to see a copy of the HR policy on handling hazardous chemicals published on the intranet 20 years ago due to litigation).
posted by phigmov at 2:19 PM on June 6, 2021 [2 favorites]

It's disturbing, if not entirely surprising (librarians do get very touchy about throwing things away) that the data from the old ILS was not only retained, but retained in a useful way. It would be a substantial pain to rebuild the old system from tape, and at best a security headache to keep the old system running hot and unpatched on an unsupported OS release. It's pretty big to be an access database on somebody's desktop - circulation data will be a join on a number of tables, some of them rather large. It would be creepy if the circ specific data was already on file in a non-anonymous way.

I'd rather hope this request prompted who ever hold that data to ask some hard questions about why it's still there and whether it's necessary or wise to hang on to it.
posted by wotsac at 2:48 PM on June 6, 2021

It's worth considering also that your searches in many library catalogs are almost always logged at least by IP address, and there's no guarantee that they're not retained just as long, in some cases deliberately, in more cases just because nobody's thinking about this.
posted by wotsac at 2:55 PM on June 6, 2021

Fascinating. (I took me a little while, as someone who knows nothing about this, to correctly parse who was behaving ethically here and who was not. I think I've got it straight. The ethical side clearly includes our own here.)
I have a post marinating in the back of my head about library insiders as a specific threat.
I'd enjoy reading it. As a non-library-related academic, I've always been bothered - but not bothered enough to do anything about it - about internal threats. As an undergrad, I had root access to a server used by nearly all of my classmates for all sorts of personal stuff. As a grad student, I administered the email server that my advisors presumably used to submit recommendation letters for me. A friend helped me guess the password to the computer running a direct competitor's experiment once. (There was a very loud alarm going off and it was the middle of the night in a remote place without good internet access and no local team members. I told them about it.) I never abused any of that privilege, as far as I know. (Really. I take this stuff more seriously than most class one felonies.)

But, it would have taken seconds to spy on my colleagues in significant ways and ten minutes to cover my tracks to a degree that anyone not running real-time forensic software on the system would never have been able to prove. That there have been tens of students after me with the same access and skill is. . . probably not ideal. On the other hand, doing things "right" and abiding by the central IT department rules of any campus I've been too is both an enormous burden and not actually focused on realistic threats. I don't actually know how to solve it without creating so much bureaucratic nonsense that smart people won't just find a way to route around it, creating other problems. (Offering IT staff the same agency and salary as Google might be a possible answer.)

"Don't log stuff you don't need to log" seems like a very good policy, even if it takes a little more effort, sometimes makes debugging hard, and possibly involves lawyers. Bad actors causing trouble in real time is a whole lot harder to solve.
posted by eotvos at 7:31 PM on June 6, 2021

I have no idea why I checked out that Michael Moorcock book...

Not looking, but hey Elric of Melniboné, Moorcock had some good fantasy going on what with weak albino prince bound to a black soul sucking sword with a mind of it's own. Even got a song: Black Blade.

At 50 I sorta wish that when I make it back to my hometown I could request the checkout history of my childhood some just because I remember them but have no clue what their title/author/etc was.
posted by zengargoyle at 7:47 PM on June 6, 2021 [1 favorite]

... and then there's old school library privacy issues.
posted by fairmettle at 9:15 PM on June 6, 2021 [3 favorites]

I remember discussions about this.

Remember When the Patriot Act Debate Was All About Library Records?
posted by infini at 3:10 AM on June 7, 2021 [4 favorites]

Librarians pushed back publicly, and were laughed at for it (an FBI agent told a journalist “militant radical librarians” were hindering their work (instant t shirt) and someone (John Ashcroft?) said something like “nobody cares which James Patterson novel you read” - but it raised serious issues when doing so was unpatriotic.

As for those old check-out cards, I forget where I read this, but it may have been Richard Rhodes’ The Manhattan Project: The Making of the Atomic Bomb; a physicist at Princeton was approached by the military to work on a top-secret project at Los Alamos, would he help? He went to the public library to check out a guide to New Mexico and knew something big was up because the check-out slip was filled with the names of Princeton’s most eminent physicists.
posted by zenzenobia at 12:17 PM on June 7, 2021 [10 favorites]

The most important fact is information that does not exist cannot be abused.

The library should never have kept records of who checked out what once the book was returned. Sure, records of how many times each book was checked out are cool. But only if it's just tracking the book title and number of times it was checked out, omitting the patron info entirely.

You can even make a case for keeping track of the total number of books each patron has checked out on a weekly, or monthly, or annual, basis. That's info that's useful to the library and not subject to abuse.

Just as technology creators need to consider how their software, or hardware, can be abused to cause harm so too anyone designing a database needs to consider the potential for abuse in the information they plan on retaning. And if it's data that could clearly be abused it shouldn't be retained unless absolutely necessary.

Like the guy from Jurassic Park said, they spent so much time on what they could do they didn't stop to think if they should.

There's a tendency among geeks to hoard data. You never know when some data might be handy, it's cheap and easy to collect, so the default is to collect all data you possibly can.

We need to rethink that, to change that default to only the data we actually, really, need. And that only if it wouldn't be harmful if leaked, stolen, or subpoenaed.

The only data that's truly safe from corporate or government overreach is data that isn't collected.
posted by sotonohito at 1:56 PM on June 7, 2021 [2 favorites]

The library should never have....

Sure, it's easy to morning-after quarterback a lot of this stuff. However, it's part and parcel with the way a lot of library enterprise software is purchased, supported, and maintained. In many (most) cases, librarians who understand the privacy rules are not the people who are creating the software that does all the various functions that a sophisticated ILS (integrated library system) does, software that has dozens of different purposes and often needs to touch many different data sources.

And, to put it bluntly, people who are really good at building software tend to get jobs in places that aren't the not-particularly-lucrative library software market. Because you basically build one tool and then each library system needs it customized in various ways, sometimes they have staff who can do this (if it's open source, which it usually isn't) or they have to open a ticket with you. This means that even if you charge for upgrades it quickly becomes a huge messy situation to manage and you'll push back against things that, as you correctly point out, the people building it think that storing more data is better.

And for every privacy advocate who agrees that we shouldn't be keeping more data than we need to--I saw Cory Doctorow give a talk to a room full of librarians advocating allowing users to keep their OWN data but encrypted so it was only available to them, this was a decade ago, no one ever implemented it--there is some tenured professor writing letters to the head of IT or the library or the board of trustees LOUDLY complaining that they can't remember what they checked out and DESPISING the privacy-focused policies of the library.

I follow Waldo Jaquith on Twitter who talks a lot about how government procurement of software works and it's not super different in larger library systems. You can read this thread and nod along at how applicable it is to a situation like this one.
posted by jessamyn at 2:16 PM on June 7, 2021 [5 favorites]

@brainwane: yes, I did look through things before posting them. I wound up not redacting anything, though -- I have some experience in having my social media trawled as part of a school or work backstab, and honestly sometimes the way to get in front of that is to get in front of it.

@phigmov: Absolutely right. The Voyager records were part of a backup, I'm fairly certain. I don't know the medium, but tape is quite likely.

@fairmettle: Yes. Technology changes; the base issues remain the same.

@jessamyn: I was volunteering at the local close-roads-for-bikes event on Sunday when I ran into someone just like that professor. (I will guess that prof was white-cis-male? My guy sure was.) I've been thinking about that problem since. I think there are workable opt-in solutions that are not "library keeps everybody's records just in case." And of course you are right about software procurement. Ex Libris (the company that owns/develops Alma) is very very far from blameless here.
posted by humbug at 4:34 AM on June 8, 2021

jessamyn I agree with everything you said and I absolutely was not trying to say the librarians were to blame for the problem.

The problem is that computer people have a tendency to want to save everything, and almost never even try to consider how what they're building could be abused much less how to prevent that abuse or avoid the potential for the abuse.

You're also 100% right that concerns about abuse conflict with users who want convenience. That's the real tradeoff when it comes to security, you can always get more security but it will always come at the cost of less convenience.

I"m guilty of that myself, I let google and Amazon collect my data. I've occasionally grumbled that it's annoying my grocery store won't/can't send me an electronic version of my receipt so I can track what groceries I buy.

So... yeah, it's complicated. And I can see the convenience of having the library track my activity so I can look back and see what books I've checked out.

I expect that while in my ideal world we'd have more programmers being mindful and collecting less data the reality is we'll collectively decide the convenience is worth the potential for abuse.

And a good argument can be made for that position!

We definitely see our definitions of acceptable privacy changing.

I think, in part, we're looking at David Brin's division of privacy into two not especially overlapping categories: privacy in the sense of being left alone to do what you want without hassle, and privacy in the sense of your activities being secret or anonymous.

We've got a lot of left alone privacy in the USA these days, and as such we're a lot more willing to give up secrecy privacy for convenience. In a way that's encouraging since it means we think of our civil liberties are fairly secure.

Or at least for non-marginalized people's definitions of "we" and "our". As always it's not comfortable cis het white people like me who bear the risk of less privacy in the anonymity and secrecy sense of the term. Though a lot of my friends who are marginalized are also shrugging and going along with not much complaint.

As is often the case, Zack Weinersmith sums up the issue in three panels of SMBC.

I'm just concerned that the general attitude among the programming class (which tends towards the cis, the het, the white, and the male, the most privileged and least vulnerable subset of Americans in other words; my people) that all data which can be collected should be will wind up hurting the less privileged people if the far right continues taking power and being awful.

If we had some assurance that going forward the only real problem would continue to be more targeted advertising, well, that's kind of bleh but not worth really getting into a froth over.

My concern is that the rising tide of far right wing politics means that in a decade or so the people abusing the data my fellow geeks keep compulsively collecting will be fascists not corporate advertisers.

As always the responsibility is with the people making the tools not the people using them.
posted by sotonohito at 7:41 AM on June 8, 2021 [2 favorites]

Sidenote: apparently my library app **DOES** track which books I've checked out at least for a time. I can see the past five books I've checked out and returned. I assume that means it keeps a record for all books.

I'd never even bothered checking to see if it kept a history until just now. Which shows how low my concern for my own personal library data is, doesn't it?
posted by sotonohito at 7:44 AM on June 8, 2021 [1 favorite]

Plus one more: "Library insiders", about insider threat.
posted by brainwane at 7:45 AM on July 2, 2021

« Older Conversations: Inuit Food Security, Inuit...   |   The Tactile Beauty of Buttons, Meters, Knobs and... Newer »

This thread has been archived and is closed to new comments