On subjective data, why datasets should expire, & data sabotage
March 12, 2020 11:31 AM   Subscribe

A Dataset is a Worldview: a slightly expanded version of a talk given by Hannah Davis at the Library of Congress in September 2019.
posted by cgc373 (4 comments total) 23 users marked this as a favorite
 
I'm still R'ing TFA, but I think stating that 'Datasets Should Expire' is a little extreme - what the author seems to be getting at is that datasets should encode the time of their creation and take that into account, which I guess is a little more verbose but a different idea altogether.
posted by Dmenet at 12:50 PM on March 12, 2020 [2 favorites]


Also needs a history of modifications.
posted by sammyo at 12:53 PM on March 12, 2020 [4 favorites]


I strongly agree with the statement that 'classification is violence'. I feel a long standing issue with analytics is the feedback loop that happens when humans inevitably use descriptive data in a prescriptive way, and the death by a thousand cuts that happens as a result - use a demographic approach to popular art like broadcast TV and you will inevitably reinforce the stereotypes of those demographics. I could see a future where AI systems are designed to be 'blind' to race with parallel systems designed to check their work.
posted by Dmenet at 1:04 PM on March 12, 2020 [3 favorites]


"Datasets Should Expire" seems like a convenient shorthand for a whole lot of issues around small, old datasets being used to inform analysis of larger, current ones. Take for example the Brown Corpus. For many years it was the corpus to base English language research upon, mainly because it was the only one that was fairly easy to get in electronic form.

Brown's just over a million words, and contains language published in the USA in 1961. It's going to contain examples of language that it's not acceptable to use today. Its contributors skew very, very white because that's how published words were 59 years ago in the USA. Much of its metadata was created manually by people of its time, and they had unconscious bias even working in tremendously good faith as diligent researchers. It's a trivially small corpus by today's standards, but it casts a long shadow.
posted by scruss at 1:11 PM on March 12, 2020 [6 favorites]


« Older ... carry this message to alcoholics ...   |   But of course they've got a word for it Newer »


This thread has been archived and is closed to new comments