Auditing Algorithms and Algorithmic Auditing
September 6, 2016 12:24 AM Subscribe

How big data increases inequality and threatens democracy - "A former academic mathematician and ex-hedge fund quant exposes flaws in how information is used to assess everything from creditworthiness to policing tactics, with results that cause damage both financially and to the fabric of society. Programmed biases and a lack of feedback are among the concerns behind the clever and apt title of Cathy O'Neil's book: Weapons of Math Destruction."

"Cathy O'Neil has seen Big Data from the inside, and the picture isn't pretty. Weapons of Math Destruction opens the curtain on algorithms that exploit people and distort the truth while posing as neutral mathematical tools. This book is wise, fierce, and desperately necessary." —[mefi's own] Jordan Ellenberg, University of Wisconsin-Madison, author of How Not To Be Wrong

Excerpt: How algorithms rule our working lives - "Employers are turning to mathematically modelled ways of sifting through job applications. Even when wrong, their verdicts seem beyond dispute – and they tend to punish the poor."

The 'Rithm is Gonna Get You - "MathBabe Cathy O'Neil is out to stop the Big Data monster."

The decision to leave her job as a tenure-track math professor at Barnard College and join hedge fund D.E. Shaw in 2007 seemed like a no-brainer. Cathy O’Neil would apply her math skills to the financial markets and make three times the pay. What could go wrong?

Less than a year later, subprime mortgages imploded, the financial crisis set in, and so-called math wizards were targets for blame. “The housing crisis, the collapse of major financial institutions, the rise of unemployment—all that had been aided and abetted by mathematicians wielding magic formulas,” she writes...

The book chronicles O’Neil’s odyssey from math-loving nerd clutching a Rubik’s Cube to Occupy Wall Streeter pushing for banking reform; along the way, she learns how algorithms—models used by governments, schools, and companies to find patterns in data—can produce nasty, or at least unintended, consequences (the WMDs of her title).

If we develop the will, we can use big data to advance equality and justice - "Weapons of math destruction, which O'Neil refers to throughout the book as WMDs, are mathematical models or algorithms that claim to quantify important traits: teacher quality, recidivism risk, creditworthiness but have harmful outcomes and often reinforce inequality, keeping the poor poor and the rich rich. They have three things in common: opacity, scale, and damage. They are often proprietary or otherwise shielded from prying eyes, so they have the effect of being a black box. They affect large numbers of people, increasing the chances that they get it wrong for some of them. And they have a negative effect on people, perhaps by encoding racism or other biases into an algorithm or enabling predatory companies to advertise selectively to vulnerable people, or even by causing a global financial crisis."

This Mathematician Says Big Data Is Causing a 'Silent Financial Crisis' - "Like the dark financial arts employed in the run up to the 2008 financial crisis, the Big Data algorithms that sort us into piles of 'worthy' and 'unworthy' are mostly opaque and unregulated, not to mention generated (and used) by large multinational firms with huge lobbying power to keep it that way."

more from mathbabe...

Reform the CFAA - "The Computer Fraud and Abuse Act is badly in need of reform... Specifically, the CFAA keeps researchers from understanding how algorithms work."
Donald Trump is like a biased machine learning algorithm - "The reason I bring this up: first of all, it's a great way of understanding how machine learning algorithms can give us stuff we absolutely don't want, even though they fundamentally lack prior agendas."
Auditing Algorithms - "I've started a company called ORCAA, which stands for O'Neil Risk Consulting and Algorithmic Auditing and is pronounced 'orcaaaaaa'. ORCAA will audit algorithms and conduct risk assessments for algorithms, first as a consulting entity and eventually, if all goes well, as a more formal auditing firm, with open methodologies and toolkits."
When is AI appropriate? - "The short version of my answer is, AI can be made appropriate if it's thoughtfully done, but most AI shops are not set up to be at all thoughtful about how it's done."
Horrifying New Credit Scoring in China - "ZestFinance is the American company, led by ex-Googler Douglas Merrill who likes to say 'all data is credit data'... Baidu is the Google of China. So they have a shit ton of browsing history on people. Things like, 'symptoms for Hepatitis' or 'how do I get a job'. In other words, the company collects information on a person's most vulnerable hopes and fears. Now put these two together, which they already did thankyouverymuch, and you've got a toxic cocktail of personal information, on the one hand, and absolutely no hesitation in using information against people, on the other."
White House report on big data and civil rights - "Last week the White House issued a report entitled Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights. Specifically, the authors were United States C.T.O. Megan Smith, Chief Data Scientist DJ Patil, and Cecilia Munoz, who is Assistant to the President and Director of the Domestic Policy Council. It is a remarkable report, and covered a lot in 24 readable pages. I was especially excited to see the following paragraph in the summary of the report: 'Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination'. The report itself is broken up into an abstract discussion of algorithms, which for example debunks the widely held assumption that algorithms are objective, discusses problems of biased training data, and discusses problems of opacity, unfairness, and disparate impact."
Three Ideas for defusing Weapons of Math Destruction - "1) Use open datasets; 2) Take manual curation seriously; 3) Demand causal models."

also btw...

The workshop on Data and Algorithmic Transparency - "From online advertising to Uber to predictive policing, algorithmic systems powered by personal data affect more and more of our lives. As our society begins to grapple with the consequences of this shift, empirical investigation of these systems has proved vital to understand the potential for discrimination, privacy breaches, and vulnerability to manipulation."
Why GDPR is the catalyst for a global digital transformation - "The European Union's General Data Protection Regulation, which all businesses must comply with by 2018, will trigger the next wave of global digital transformation... Recently, Facebook had to ensure that the data it had collected on EU citizens wasn't being misused in the US. The ruling resulted in US firms scrambling to get their data agreements in order and ensure the safe sharing of data."
This Company Has Built a Profile on Every American Adult - "For more than a decade, professional snoops have been able to search troves of public and nonpublic records—known addresses, DMV records, photographs of a person's car—and condense them into comprehensive reports costing as little as $10. Now they can combine that information with the kinds of things marketers know about you, such as which politicians you donate to, what you spend on groceries, and whether it's weird that you ate in last night, to create a portrait of your life and predict your behavior. IDI, a year-old company in the so-called data-fusion business, is the first to centralize and weaponize all that information for its customers. The Boca Raton, Fla., company's database service, idiCORE, combines public records with purchasing, demographic, and behavioral data. Chief Executive Officer Derek Dubner says the system isn't waiting for requests from clients—it's already built a profile on every American adult, including young people who wouldn't be swept up in conventional databases, which only index transactions."
What Machines Know: Surveillance Anxiety and Digitizing the World - "Who I think I am doesn't matter — what matters is what the algorithms of Google, my potential employer, my health insurer, and the Department of Homeland Security say I am."
Ban the Box? An Effort to Stop Discrimination May Actually Increase It - "Research suggests that unintended consequences may foil well-intentioned policies."
It's ML, not magic: machine learning can be prejudiced - "If we're not careful, optimizing life for some will be equivalent to handicapping life for others."
Machine Learning Needs Bias Training to Overcome Stereotypes - "It's time for the risks of social bias to be embedded deeply in data science codes of ethics and education."
The complexity of machine learning does not make it unregulable - "For the most laissez-faire commentators in the debate on algorithmic accountability, each step in the process of algorithmic ordering is immune from legal contestation or inspection: 1) the data gathered for processing are protected as trade secrets, 2) the processing itself is too complex for any human to understand, and 3) its outputs are 'free expression', exempt from ordinary legal restrictions. I have disputed 1) and 3) in other work, and in this provocation I deem 2) the 'sweet mystery of machine learning' approach to deflecting regulation... Even if algorithms at the heart of these processes 'transcend all understanding', we can inspect the inputs (data) that go into them, restrict the contexts in which they are used, and demand outputs that avoid disparate impacts."
Numbers don't need to be trusted to shape our lives: they just need our attention - "With the escalating use of metrics across all areas of our lives, we have seen the power of numbers shifting from being about faith to being about persuasion. It is no longer what numerical measures tell us, but what metrics tell us to do. When we look at the way that numbers are used, we see increasingly calculative modes of reasoning deployed in our workplaces, in our consumer behaviour, and in determining whether schools, hospitals, universities, countries are failing or succeeding."
A Famed Hacker Is Grading Thousands of Programs — and May Revolutionize Software in the Process - "Mudge and his wife, Sarah, a former NSA mathematician, have developed a first-of-its-kind method for testing and scoring the security of software — a method inspired partly by Underwriters Laboratories, that century-old entity responsible for the familiar circled UL seal that tells you your toaster and hair dryer have been tested for safety and won't burst into flames. Called the Cyber Independent Testing Lab, the Zatkos' operation won't tell you if your software is literally incendiary, but it will give you a way to comparison-shop browsers, applications, and antivirus products according to how hardened they are against attack. It may also push software makers to improve their code to avoid a low score and remain competitive."
Make Algorithms Accountable - "We need more due process protections to assure the accuracy of the algorithms that have become ubiquitous in our lives."

posted by kliuless (61 comments total) 255 users marked this as a favorite

In 8th grade, I had an English assignment to write a book review on a specifically NON-fiction book. The shortest book I could find was one that changed my life: "How to Lie With Statistics" (pdf). It was an introduction to statistics and their manipulation, with a sense of humor and a deep cynicism. After having been a "math wiz" for my educational life, I suddenly saw Numbers as no longer something simply factual, but as the tools for every kind of mischief, at all levels. Needless to say, So the first time I heard the term "Big Data", I automatically put the word "Bad" in the middle of it. I find nothing surprising in all the links kliuless provides here, but thank you anyway.
posted by oneswellfoop at 1:05 AM on September 6, 2016 [20 favorites]

Dear god. What a superb post, Kliuless - both in terms of breadth and scope, and subject area. Thank you. I'm looking forward to working through this.
posted by davemee at 1:34 AM on September 6, 2016 [15 favorites]

Great overview and timely subject – many thanks for assembling. Algorithms rising in real-time proximity to – and control of – human life is obviously a vast and nuanced subject. I often see my own thinking reflects the field itself. An understanding of fragments without any current capability to see the whole view. I offer a few of those fragments here in the context of the WMD book reviews that I have read...

1) She seems to occupy the 'warning' position between demonstrations of technology and practical applications. The sentencing programme works in the laboratory and in field trials, but begins failing when applied to uncontrolled situations, where the context is far more variable and pattern recognition fails, for it begins summarising unique cases into specific cases.

The 'warning' position appears when technology produces good initial results, but when both the buyers and sellers miss the larger implications. That the sentencing technology may work today to a robust enough level to demonstrate, but that the larger point is that technology fundamentally violates a key foundation of the American legal system – inalienable individual rights. If there is any averaging present in the system or pattern matching used for actual sentencing recommendations, that violates the foundations of law.

We are never meant to average data in determining whether or not to strip someone of their rights. With humans, there's the expectation of bias, and related control mechanisms in place. A white male judge (for example) handing down a lenient sentence to a convicted sex attacker creates protest and removed from hearing cases. He is likely guilty of an inherent bias that averaged the case – when white males from elite universities commit crimes, they are unlikely to reoffend hence they can be scared straight. Wrong call. He was removed from service. And someone else will hear cases and their bias will be evaluated.

However, that human process of bias assessment and control is not only slow, but also creates permanent impacts in testing for bias. The case that the judge heard is closed. It may have revealed the judge's bias, however that justice has been handed out and the cycle completed. The siren song of algorithms is that the programmes execute much faster and can be adjusted much faster – thus their creators will assume that they achieve better results more quickly. It's easy to see the sales pitch for sentencing algorithms. Humans cannot process all the data. Humans make mistakes. Humans are slow to correct their mistakes. Algorithms can process more data. We can correct their mistakes quickly.

Yet fundamentally, just like a human, the algorithm is only as good as the data that it is being fed. The bias comes not in the processing, but in the original data selection. Further, algorithms work through a process of data reduction – running through piles of information to find salient points and discarding irrelevant data. The fundamental problem is who determines what data is irrelevant?

If I'm training a machine learning programme on sentencing recommendations and I simply use prior cases, I will get a machine not that produces ideal sentencing recommendations, but rather a machine that shows the bias of previous sentencing recommendations. That is why I say she occupies the warning position. While its brilliant to create a machine that recommends sentences, the reality is that given the data it is currently being fed, what that machine may well be showing is the previous bias of sentencing, rather than sentencing that reflects the true will and spirit of the law with respect to individual rights.

2) This is a problem that we are going to see grow exponentially before we learn how to solve it. For, the ultimate solution is in fact artificial intelligence – a general learning artificial intelligence that can then create its own child algorithms. For, anything we design is going to have our bias built into it, because of the data sets that we feed it.

If I am building a consumer credit reporting system, in all likelihood that system is going to efficiently allocate credit according to existing credit models. What that system is going to show is the bias of the credit models. Further, if I am using that system to select economic winners and losers, what I am really doing is creating economic winners and losers based on the current paradigm. Those with good credit will get better credit. Those with bad credit will get worse credit.

Because that is what algorithms are designed to do – solve problems. Taken to a fictional extreme, if we apply the credit algorithm to the current economic structure, we likely would get one person with all the credit. We're already seeing that operation occur from the human structure of the financial industry, and it will be exacerbated by the machine structure.

The current financial structure only works individually, not systemically. If entity A loans entity B money, the algorithm is designed to maximise the return to entity A by assessing the likelihood of entity B to repay. However, in doing that, the algorithm misses the fact that both entity B and entity C is better off if entity B does not repay the money, leaving entity A worse off.

This comes back to the the mortgage and student loan crises, where the economy as a whole may well have been better off if the debt was simply written off. That would have put more money in the hands of consumers by taking less money out of the hands of consumers. They would have spent in local areas and driven local economies, rather than sending the money back to the banks.

The summary is that the algorithms are being calibrated for best return to the people that create the algorithms. That may not be the best return for society overall, yet the algorithm has no way of discriminating.

I look forward to the time when there is an algorithm built to understand true economic and social value. I would like to run that on the activities of every worker in the United States, and see if investment bankers are truly 500% more valuable in a given year than teachers or nurses. I would like to analyse if the economy is better off when people spend their working lives repaying their debts or default on them.

The current logic is that the system says that investment bankers are 500% more valuable than teachers and nurses, and that the economy is better off when people repay their debts. What is implicit in that is if we like this system and agree this system is working.

And that is the greatest hazard of the algorithm today.

I spoke with a policy administrator in the UK about quantitative easing and its effect on both asset prices and wages. As far as I understand it, quantitive easing is driving up asset prices at the expense of wages – hence, the QE policies themselves are one of the largest contributors to growing income inequality and societal distress.

His view was relatively different. Look around, he said. Do you like this city? Do you like full grocery stores? Do you like living with low crime? Fresh water? A robust power grid? All of that comes from this system operating in its most efficient way. Your life tells you that not only is the system working, but that further we must continue to optimise it. There will be costs. There are always costs. But the reality is better than the alternatives.

And below that lies the greatest conundrum of the algorithms. As an entrepreneur and innovator, I immediately think how much better the city could be. How much fuller the grocery stores could be. How much lower crime could be. Etc. I see the present system as a starting point, not an end result. My mate sees it as the ending point – one to be protected and defended.

As we build these algorithms, they are first not going to give us the answers we want, rather they are going to show our bias. The more we apply them to credit ratings, the greater the disparity between those who have credit and those who don't will become. We will see this as the algorithm working, when it will simply be showing a more detailed picture of the reality behind the credit ratings. When we apply it to sentencing, it will show us the biases of all the sentences that have come before.

And there are intense bright spots if we have the bravery to embrace them.

Genetic college admissions that focus only on teachability, specifically instructed to be blind to race and gender genes.

Tax programmes that exclude current biases – such as the differentials between capital gains and income tax – and focus strictly on best overall economic growth.

Social value programmes that measure the value of an individual to society based on their total contribution to the other people within their ecosystems and networks.

Immigration programmes that create policy based on data, not personal preferences.

All of that does involve handing over freedom and decision-making to machines. We may not like all of the answers – indeed, they will show us other horrible truths about ourselves.

However, it will all come down to 1) who designs the algorithms, 2) what data they train the algorithms on, and 3) what the goal of the algorithms is.

If its banks training algorithms on existing profitability data to create more profit, they will create solutions that extract more profit.

Finally, one important thing to remember about the Chinese model – the citizen score – is that it is not designed solely as a credit score, but rather also a social compliance score. It's a chilling brilliant piece of social engineering in which compliance is rewarded with credit. It is not something to be emulated, but rather something to be observed.

Overall, the current algorithmic wave is going to tell us more about who we are as a society, than what we design the algorithms to do. The question remains if we have the courage to listen.
posted by nickrussell at 3:28 AM on September 6, 2016 [18 favorites]

Cathy's blog is really a fun read about math topics clear without dumbing down but she also does a personal service as "Ask Aunt Pythia" on just about any topic of human interaction also from a very smart perspective.
posted by sammyo at 4:16 AM on September 6, 2016 [1 favorite]

"Algorithm" has become such a buzz-word that I think a lot of people talk about algorithms without even really knowing what the word means. I do think that a lot of people use "algorithm" basically to mean "mathy magic," and that makes it easy to believe that the mathy magic is scientific and correct and free from bias.

Anyway, I'm sort of fascinated by this stuff, and I will definitely put this book on my to-read list.
posted by ArbitraryAndCapricious at 4:31 AM on September 6, 2016 [3 favorites]

Social value programmes that measure the value of an individual to society based on their total contribution to the other people within their ecosystems and networks.

*gulp*

What could go wrong? And I came here to say how reliance on credit rating has gotten out of hand.
posted by maggiemaggie at 4:32 AM on September 6, 2016 [9 favorites]

I've lurked on this site for years but this post's dragged me out of hiding to comment: fascinated by algorithms, particularly Google & Facebook's sneaky-but-powerful ones, and I cannot wait to trawl through all this. Thank you!
posted by greenlottle at 4:49 AM on September 6, 2016 [6 favorites]

I re-read nickrussell's comment and I understand that he was talking about a social value rating as opposed to a credit rating that would, perhaps, show that teachers and nurses are more valuable to society than bankers. This could be used in opposition to credit ratings that end up giving all the credit to one person.

My fear is still that any rating of an individual's worth could just as easily be used against them wrongly, as credit ratings are.

Democratic countries still claim to value dissent, how could that be rated in any system of valuing individuals?
posted by maggiemaggie at 4:49 AM on September 6, 2016 [1 favorite]

There's a myriad of new indices being created, powered by everything from cryptocurrencies and blockchains, to machine learning on public and private records. There is no question data will be used to rate nearly every area of life and resource allocation. The question is 1) who designs them, and 2) what data will they use?
posted by nickrussell at 4:52 AM on September 6, 2016

My fear is still that any rating of an individual's worth could just as easily be used against them wrongly, as credit ratings are.

I would hesitate to say "right" or "wrong". Rather, credit ratings either 1) directly reflect an individual's true capabilities / intentions, and 2) produce the best outcomes to the system.

Credit – in the form of debt – generally functions on the basis that an asset will increase in value. If you make $30,000 and take a loan of $100,000 on a house, the implication is that both your salary will rise (at least with inflation) and also the value of the house will rise.

That was the basis of student loans, for example. That a student takes $30,000 in debt, and that college education improves likely earnings by at least 10% per annum. Hence, if the student could earn $30,000, after college they can earn $40,000. If we presume salary rises of 5% per year (slightly above inflation), after ten years, the college graduate earns $19,000 more per annum than the non-college graduate.

That is the default credit model, which assumes 1) starting salary will improve, 2) that improvement will compound over time. The model begins failing when those conditions are no longer met, and a debt trap is created. As functionally, debt is borrowing from future earnings with the expectation that the overall gains will be greater than the interest.

The problem isn't that the models are wrong, but rather what they are calibrated to, and what they are calibrated to do. If you calibrate the model to maximise economic return in the form of interest to a lending bank, you end up with a different model than if you calibrate to maximise overall education level of the population. That is obvious in European countries, where education is paid for by taxes. The states spend overall less money on education, however, they graduate (arguably) better educated people.

With student debt, one of the key issues is the inability to discharge student debt via bankruptcy, which further changes the model on the lending side. Defaulted loans are sent to collections or sold off, but not formally discharged as bad debts, hence changing the value of the portfolio and the related asset allocation model.

Point being, it's neither right nor wrong, rather these are mechanisms which shape resource allocation and human behaviour.

As mentioned, a credit model to maximise economic activity is different from a credit model to maximise adult education level.
posted by nickrussell at 5:05 AM on September 6, 2016 [5 favorites]

One of my favorite (Computer Science) professors in undergrad is doing some really interesting work on combating this by attempting to build methods to reveal possible unfair biases that complex ML systems can wind up learning.
posted by Itaxpica at 5:21 AM on September 6, 2016 [2 favorites]

O'Neil is one of the three regulars on Slate Money Podcast, where she occasionally–well, regularily—talks about similar subjects. Recommended.
posted by dst at 5:38 AM on September 6, 2016 [1 favorite]

EU citizens might get a "right to explanation" about the decisions algorithms make.
posted by thatwhichfalls at 6:13 AM on September 6, 2016 [1 favorite]

China's social credit programs sound terrifying. It's like a bunch of bankers on a theology Wikipedia article binge thought it'd be cool to make the invisible hand more visible and intelligent.
posted by mccarty.tim at 6:23 AM on September 6, 2016

We are never meant to average data in determining whether or not to strip someone of their rights.

I literally don't know what this means.
posted by PMdixon at 6:25 AM on September 6, 2016

This is an excellent post, and really relevant to me as I'm about to embark on a career in data science. Thanks, kliuless!

I've often wondered to what extent these sorts of issues are covered in formal education: there is a very big push to turn scientists into data scientists, but it seems that there are so many ways to get it subtly wrong. I did an astrophysics degree and we seem to be massively in demand as data scientists, but that's because it's an attractive mix of mostly self-taught skills and not because of any formal training. Do programmes intended to train people specifically for data science/analytics cover the kinds of issues raised in this post?
posted by dashdotdot dash at 6:34 AM on September 6, 2016

this is a great post. Based on the few links here I had read before I am probably going to disagree with nearly all of it - but in interesting ways. There is a lot of interesting stuff here to add to my instapaper queue - thanks for posting!
posted by Another Fine Product From The Nonsense Factory at 6:36 AM on September 6, 2016

I work in data, and have a *very* different moral/political/ethical outlook than those around me. I'll be reading this post for days. (But not at work.)

It's so damned difficult to get people to listen to the simple statement of "The data doesn't SAY anything. Your interpretation says a lot." It's so easy to optimize for the wrong thing. In today's world, it's shareholder value, or property value, or revenue, or clickthroughs. I almost think anything aiming at only one metric is inherently flawed. [/rant]

PS: How to Lie with Statistic should be taught to in EVERY middle/high school in the world.
posted by DigDoug at 6:52 AM on September 6, 2016 [10 favorites]

"Algorithm" has become such a buzz-word that I think a lot of people talk about algorithms without even really knowing what the word means.

Ooh, ooh, I know this one (etymologically, anyway): the word comes from the name of the medieval Persian mathematician Muhammad ibn Musa al-Khwarizmi. Better yet, his masterwork was The Compendious Book on Calculation by Completion and Balancing (in Arabic: الكتاب المختصر في حساب الجبر والمقابلة‎‎ or Al-kitāb al-mukhtaṣar fī ḥisāb al-ğabr wa’l-muqābala, which title pleasingly gives us the English words algebra and gibberish.
posted by ricochet biscuit at 7:02 AM on September 6, 2016 [22 favorites]

You shouldn't assume that there is any credit policy rationale built into student loans.

Student loans ignore a student's history of defaults and delinquencies; ignore a student's GPA, resume, and standardized test scores; ignore the earnings of a field of study; and ignore the ranking and job placement record of the student's particular school. Someone with a bankruptcy who barely got admitted to the worst law school in the country can borrow exactly the same amount as someone with a 750 FICO can borrow to go to Harvard Business School.
posted by MattD at 7:27 AM on September 6, 2016 [1 favorite]

It is like we forgot a key lesson of Robert McNamara and Vietnam. Analyzing big datasets often results in a distorted view that contradicts the reality.
posted by humanfont at 7:35 AM on September 6, 2016

PS: How to Lie with Statistic should be taught to in EVERY middle/high school in the world.

Generally, I agree, although the examples badly need to be updated, and some of the illustrations are... a bit out of date, to say the least.
posted by GenjiandProust at 7:42 AM on September 6, 2016 [3 favorites]

I agree so much with the excellent muckraking here, yet the hype-driven thesis of these articles is somewhat irritating. The idea that the flaws of "big data" are meaningfully different than the shoddy data / practices of past times is ludicrous. There are problems with this stuff for sure, but are they worse than the problems we faced in past eras? That is probably something that's impossible to know.

There's a formula for generating content which goes something like:

1. Find new buzzword for vague thing that really encompasses a lot of old ideas.
2. Find out that this new buzzword thing isn't all good! Human folly in its old and classic forms applies even to this new idea!
3. Revel in the smugness that those geniuses behind the buzzwords aren't so smart after all eh!

There have been faulty applications of algorithms to data, and bad data collection for about as long as humanity has existed. The fact that we now can harvest data on larger scales and still commit the some errors is a non-surprising result. Exploring the consequences there is interesting, but claiming that as some sort of discovery just seems gratuitous.
posted by sp160n at 7:43 AM on September 6, 2016 [10 favorites]

I think I read it here, but my favorite quote on the subject is "machine learning is money laundering for bias".
posted by benzenedream at 8:10 AM on September 6, 2016 [12 favorites]

nickrussell's policy admin mate: "His view was relatively different. Look around, he said. Do you like this city? Do you like full grocery stores? Do you like living with low crime? Fresh water? A robust power grid? All of that comes from this system operating in its most efficient way. Your life tells you that not only is the system working, but that further we must continue to optimise it. There will be costs. There are always costs. But the reality is better than the alternatives."

NASA disagrees: "Too much inequality and too few natural resources could leave the West vulnerable to a Roman Empire-style fall."

Efficient systems are great. If there's too few people benefitting from them though, then one has to wonder why things are weighted more than people. We've been there in the US, which has a very efficient, top-notch healthcare system. If you can afford it.
posted by fraula at 8:13 AM on September 6, 2016 [2 favorites]

We are never meant to average data in determining whether or not to strip someone of their rights.

I literally don't know what this means.

PMdixon: Men, on average, are more likely to be involved in an auto accident than women. Insurance companies (used to?) use this as justification to charge men higher premiums than women. This may make sense from a business/actuarial perspective, but is maybe unjust because of how it treats individuals as responsible for the groups to which they (for the sake of argument) involuntarily belong.

Now transfer to the context of differential arrest likelihood by race and the resulting stop-and-frisk bias. We can go one step further and say "he wasn't stopped because he was a young black male, he was stopped because my deep learning cloud police AI app flagged him." The fear is that careless machine learning hides the easily recognized ecological fallacy in a proprietary black box that claims to be objective.
posted by schnellp at 8:27 AM on September 6, 2016 [14 favorites]

sp160n: The idea that the flaws of "big data" are meaningfully different than the shoddy data / practices of past times is ludicrous. [...] The fact that we now can harvest data on larger scales and still commit the some errors is a non-surprising result.

I think that's not quite the point - of course there will be mistakes. The problem is that now the person making the error can point to the omniscient black box that generated the result, and no one can dispute it because the decision was handed down by the Algorithm.

Or, to repeat the pithy quote from above, "machine learning is money laundering for bias". That's the concern.
posted by RedOrGreen at 8:32 AM on September 6, 2016 [2 favorites]

Yeah, people tend to talk about algorithms as if they aren't human inventions. It's like an algorithm is the result of stumbling onto patterns that were already there, and we're just giving ourselves a way to see them.

Great post, lots to read here.
posted by teponaztli at 8:37 AM on September 6, 2016 [1 favorite]

These subtleties and nuances should not be dismissed or taken lightly; tiny bad assumptions at the row level when aggregated can yield massively distorted and misleading results. These issues of data interpretation are not marginal--they're central. Math is a language--a highly precise, specialized, and abstracted language, but that's all it is at bottom. Even the conceptual realities the language of math describes have been demonstrated to exist independently of the specific mathematical languages used to describe them (incompleteness). So people should think about and interrogate statements made using data and the language of math just as critically and skeptically as statements made in less precise languages, like plain English. Unfortunately, not everyone speaks math very well.
posted by saulgoodman at 8:37 AM on September 6, 2016 [1 favorite]

Currently my nonprofit is struggling to get a certain type of data metric included in the new UN Sustainable Development goals. I believe the metric is 'the number of people who recieved criminal legal aid and who felt the process was fair.' This data would probably do a lot to help reform justice systems to be fairer... but almost nobody currently collects it. So the UN doesn't want to include it as a metric, because the cost of including it is more prohibitive than including types of data which are already collected.

Big Data does not include all data. The types of data which are collected and analyzed are selected by humans with agendas who then feel free to deny that any such agendas exist.
posted by showbiz_liz at 8:37 AM on September 6, 2016 [16 favorites]

"Algorithm" has become such a buzz-word that I think a lot of people talk about algorithms without even really knowing what the word means.

For the totally non-tech when you hear "algorithm" think wheel or perhaps gear. A society changing tool that can be used for good or other. But don't minimize the potential for total change in the world, think pre-industrial revolution as where we are now.

Now, remember it's all math, and pretty standard math used in new ways. Also remember most actual math that's not woo theoretical is decades or hundreds of years old. Many of these clever recipes (algorithms) used on information (big data) work due to the incredible advances in computer hardware. Every smartphone is a supercomputer in 60's or 70's terms. And the rate of increase is in many ways overtaking the most optimistic estimates (Moore's Law may have slowed for a certain part of the infrastructure but others like GPU's are exploding).
posted by sammyo at 8:38 AM on September 6, 2016

There are some differences in the way big data is collected and used now that are different, though.

For one, there is less oversight. Not just public oversight, but actual human oversight. Machine learning lets them take humans out of the equation entirely, and the decision making is often done entirely by machine.

Which leads to another big difference, which is plausible deniability. Machines are not capable of forming intent, and they do not have innate biases. They're still redlining, but they're not doing it because they are actually racist. They're just reinforcing structural inequities based on learning from inequitable systems, and they can do so much more efficiently and more subtly using data that is far less hamhanded that straight up redlining, combining multiple data sources that, while technically not directly identifying people by social class, correlate very closely. So maybe X kind of people buy a lot of store brand cotton swabs, prefer Frank's Red Hot sauce to Tabasco, drive beige cars, AND live in these ZIP codes, and are better or worse credit risks, statistically.

And, of course, there's just so much more data, a lot of it fuzzy, a lot of it just plain wrong, and it's being used more extensively all the time, and it's nearly impossible to correct because we don't always even know what information is feeding those assumptions. And not just "we" as in people who don't work for the organizations using that data, but "we" as in humans.

You could tomorrow make a specific menu substitution at Olive Garden that flags you as a potential drug mule, or do some combination of things--going to a car wash in the middle of the day and then calling a toll free number for information about selling your timeshare--that flags you as having some medical condition, and you would never know.
posted by ernielundquist at 8:39 AM on September 6, 2016 [3 favorites]

Weapons of math destruction, which O'Neil refers to throughout the book as WMDs, are mathematical models or algorithms that claim to quantify important traits: teacher quality, recidivism risk, creditworthiness

Here in Australia we have NAPLAN (the National Assessment Program - Literacy And Numeracy). The tests it uses were originally designed to identify specific strengths and weaknesses in individual students so as to help tailor programs for them, which is fine as far as it goes but really doesn't tell any competent teacher anything they didn't already know. How it's actually being applied now is to rank whole schools against each other, on the basis of one test sat once per year at four different year levels. The ranking is transparently obvious bullshit, but most people seem to take it as gospel.

I work in a school, so I get to see the effects of this bullshit up close and personal and I've been complaining about it to everyone who will listen for quite some while. It hadn't really hit home to me that exactly the same kind of bullshit is now being applied across almost every field of endeavour, so thanks for this post, kliuless; will be working through it with great interest.
posted by flabdablet at 8:44 AM on September 6, 2016 [1 favorite]

Big Data does not include all data. The types of data which are collected and analyzed are selected by humans with agendas who then feel free to deny that any such agendas exist.

[The move to "smart cities"] will likely amplify the bias that already favors counting cars, but not people. Consider New York City, perhaps the most pedestrian-oriented place in the nation. New York gathers data on pedestrian activity in a twice-annual survey (which counts pedestrian traffic on two different days in May and September at 100 locations). Contrast that with its system that reports vehicle traffic speeds in real time at more than 300 locations.

posted by enn at 9:07 AM on September 6, 2016 [9 favorites]

RedOrGreen:

I think that's not quite the point - of course there will be mistakes. The problem is that now the person making the error can point to the omniscient black box that generated the result, and no one can dispute it because the decision was handed down by the Algorithm.

That is really not a new problem at all. Take, for example, any sort of standardized testing like say IQ or SAT tests. Whatever problems they have will be applied across the population as a whole. The only thing distinguishing this from "Big Data" is that it cannot be retroactively applied to existing data. If you have a problem with a test, well, the test/algorithm cannot be refuted! These tests have the same problems of encoding bias in algorithms. Nothing new under the sun.

There are so many examples of these sorts of things. What "Big Data" changes is that there are now more sources for data, and that data can more easily retroactively applied. The real question is whether the gains from these insights outweigh the losses. I don't think that question is possible to answer in any sort of practical way.
posted by sp160n at 9:25 AM on September 6, 2016

David Simon / The Wire was in on this, too:

BILL MOYERS: Is it because we can't go where the imagination can take us? We are tethered to the facts?

DAVID SIMON: Well, and facts-- one of the themes of THE WIRE really was that statistics will always lie. That I mean statistics can be made to say anything.

BILL MOYERS: Yes, one of my favorite scenes, in Season Four, we get to see the struggling public school system in Baltimore, through the eyes of a former cop who's become a schoolteacher. In this telling scene, he realizes that state testing in the schools is little more than a trick he learned on the police force. It's called "juking the stats." Take a look.

[...]

ASSISTANT PRINCIPAL: So for the time being, all teachers will devote class time to teaching language arts sample questions. Now if you turn to page eleven, please, I have some things I want to go over with you.

ROLAND "PREZ" PRYZBYLEWSKI: I don't get it, all this so we score higher on the state tests? If we're teaching the kids the test questions, what is it assessing in them?

TEACHER: Nothing, it assesses us. The test scores go up, they can say the schools are improving. The scores stay down, they can't.

PREZ: Juking the stats.

TEACHER: Excuse me?

PREZ: Making robberies into larcenies, making rapes disappear. You juke the stats, and major become colonels. I've been here before.

TEACHER: Wherever you go, there you are.

This is also covered extensively in Seeing Like a State (James Scott) - the State has to make things legible. It's how you go from an "untamed" forest to rows and rows of plantation pines. We need something measurable so that we can perform experiments on it and test our results. But that legibility also reduces complexity and makes it easier to game. It's a fundamental problem of capitalism. It's a really smart, really efficient system. But as we design these systems, we have to also know and recognize the edge cases.

One more example: voting restrictions. Used to be you just had a "literacy test" to restrict people of color from voting. Pretty blunt instrument - if somebody wants to vote and you don't want them to, you give them the unanswerable test. Nowadays, that's easy to strike down. So instead, you mine for data: who votes early? Where? When? Who needs a student ID? You craft voting restrictions around certain communities. North Carolina's problem wasn't that they restricted early voting or student IDs, it's that they explicitly said they were doing it to reduce African American turnout. And the Supreme Court still voted 4-4.
posted by one_bean at 9:31 AM on September 6, 2016 [16 favorites]

kliuless, how do I donate to you?
posted by Potomac Avenue at 9:37 AM on September 6, 2016 [2 favorites]

This is a great post. The problem is not just that algorithms misclassify people and reify prejudice; the problem is also that reducing the error easily ends up eviscerating the right to privacy.

Because in an era of networked digital systems, it is not really *necessary* to sort through terabytes of data to construct a brittle proxy for the quantity you're tracking: the machines can simply talk among themselves to exchange the information they need. People who don't want to provide that kind of access? Well, they must have something to hide, so we'll classify them as high-risk, low-value, and if they think that's unfair, they'll just have to open up...

Peppet elaborates on this perspective -- the shift from a "sorting" to a "signaling" economy -- in the article linked below.

Unraveling privacy: The personal prospectus and the threat of a full-disclosure future.

This Article makes three claims. First, [...] that rapidly changing information technologies are making it possible for consumers to share verified personal information at low cost for economic reward or, put differently, for firms to extract previously unavailable personal information from individuals by offering economic incentives. In this world, economic actors do not always need to “sort” or screen each other based on publicly available information; instead, they can incentivize each other to “signal” their characteristics. [...] Second, this change towards a signaling economy,” as opposed to the “sorting economy” in which we have lived since the late 1800s, poses a very different threat to privacy than the threats of data mining, aggregation, and sorting that have preoccupied the burgeoning field of informational privacy for the last decade. In a world of verifiable information and low-cost signaling, the game-theoretic “unraveling effect” kicks in, leading self interested actors to fully disclose their personal information for economic gain."

posted by dmh at 9:49 AM on September 6, 2016 [3 favorites]

Rather, credit ratings either 1) directly reflect an individual's true capabilities / intentions, and 2) produce the best outcomes to the system.

I'm trying to understand this statement. Are you saying that's how credit ratings have actually functioned in the U.S. to date? Because a quick look at their history would indicate that's very wrong, whether you're talking about the humble credit rating of a single individual or credit ratings assigned to complex securities by the NSROs.
posted by praemunire at 10:19 AM on September 6, 2016 [1 favorite]

I think one of the problems with the way that these kinds of arguments can be interpreted is that the lesson isnt, 'big data machine learning causes bias so its bad and we need less of it'. The lesson should be 'we need to use data in smarter ways'
posted by MisantropicPainforest at 10:29 AM on September 6, 2016 [1 favorite]

Something I think about a lot is the intersection of disability with these algorithms.

I have a documented mental health disability that affects memory and is acknowledged by the ADA as a disability that must receive reasonable accomodation. But what does that reasonable accomodation look like? I also receive a credit score that affects my ability to buy - or even rent - a house or apartment, which largely takes into account "have you been late with a payment"? There's no way for me to say, "Yes, but only for places that don't accept automatic payment". There's no way for me to say, "Your entire system is discriminatory against people with mental health issues and prevents me from fair access to housing." There's no way they will let me remove late payments from my algorithm-created FICO score - and even if they let me "opt out" of FICO scores, that would mean that I would have "no credit score" and property management offices would still not rent to me.
posted by corb at 10:29 AM on September 6, 2016 [3 favorites]

There are so many links that my dup-finding algorithm thinks this one is missing. https://www.youtube.com/watch?v=gdCJYsKlX_Y
posted by Obscure Reference at 10:41 AM on September 6, 2016 [1 favorite]

The lesson should be 'we need to use data in smarter ways'

Also there was a lot of data passing through pre-electronic mail, generally very safe and private. How could that be? Strong social conventions, strong legislative protections. There is poor comprehension by our government well and society in general about what the limits and protections need to be. Places like the Berkman center are discussing but the topic needs to be more central to more fields and discussions in every segment of academic and social discourse.
posted by sammyo at 10:41 AM on September 6, 2016 [1 favorite]

Yet fundamentally, just like a human, the algorithm is only as good as the data that it is being fed. The bias comes not in the processing, but in the original data selection.

There's some ambiguity here whether you mean bias in the statistical sense of "producing results that systematically tend to deviate from the correct/desired result," or in the social sense of "systematically favoring some individuals over others based on factors unrelated to merit or need." But in either case, I'd argue that the processing, not only the data selection, in an algorithm can very easily introduce bias. In the statistical sense, almost trivially; we can pull a whole host of statistical algorithms off the shelf with known statistical bias, and sometimes these biased algorithms are actually more useful than their unbiased equivalents. In the more interesting social sense: the choice of statistical model, including not only what variables to include but also whether and how they interact; nesting structure of the population/covariates; prior distribution parameters in the case of Bayesian models; all of these are decisions that the designer of the algorithm must make, which may subtly or overtly favor some people over others. In many cases, the "right" choices for these are not necessarily obvious, and researchers may try several versions of their algorithm before settling on one that seems to give "sensible" results. Of course, what seems "sensible" to the researchers may reflect their own biases, just as much as the choice of what seems "sensible" to measure as inputs to the algorithm.
posted by biogeo at 10:46 AM on September 6, 2016 [1 favorite]

This is the polar opposite of #happyfunseptember, but is an excellent post. The book is definitely going on my list. Thanks!
posted by Fig at 10:57 AM on September 6, 2016

What algorithmic injustice looks like in real life

tl;dr is there nothing racism can’t infect?

or: if you have a structurally racist society, then "neutral" algorithms will likely reproduce - or not mitigate - current structural inequalities.
posted by lalochezia at 11:39 AM on September 6, 2016 [5 favorites]

For the totally non-tech when you hear "algorithm" think wheel or perhaps gear. A society changing tool that can be used for good or other.

An algorithm is just a well-defined process or procedure: a recipe. So yes, a tool, maybe more similar to how industrial management systems, distribution network systems, economic systems, political systems are tools. But yeah, "scientific management", capitalism, democracy, etc. were all pretty society-changing tools, for good or other.

sp160n, the issue with big data and algorithmic decision-making is tied in with one of the other issues that Cathy O'Neil writes about sometimes, that in the US and similar countries, we teach math rather poorly, and with (as Jo Boaler describes in her book Mathematical Mindsets) a tendency to inculcate the false idea that math is something that people are either good at or not (a fixed mindset), and that most people are not good at. So we combine widespread math anxiety as well as overall inadequate math education with a particular veneration for the role or authority of mathematics.

I think that the issue with the growing role of big data and related algorithms is an extension of what's known as the mathematization of a variety of fields that began much earlier in the century, with, in some cases, similarly detrimental results(*). But there does seem to have been a significant change in the extent of impact.

(* Eg., part of the rationale for the Cold War arms race was based on a very simplistic application of game theory, modeling each decision to increase the US nuclear arsenal and related armaments as a single Prisoner's Dilemma game, ignoring the potential for signaling and communication that would be better modeled by a tournament of repeated games. The book Soldiers of Reason: The RAND Corporation and the Rise of the American Empire gives some more detail on this particular example. Of interesting note: characters such as Dick Cheney and Donald Rumsfeld play a role, both having come from the Chicago School; which also championed the sort of glorification of mathematical models in decision making while failing to check that initial conditions are satisfied that underlie orthodox capitalist economic theory, with it's assumption of Homo Economicus, or "Rational Man". Anyway, mathematization refers to the application of quantifiable metrics and mathematical models in any new field where disciplinary methods had previously been more qualitative. It can be done well, but unfortunately is often done poorly from both a human and mathematical perspective, by failing to accurately model and check hypotheses/assumptions, leading to the garbage in-garbage out problem.)
posted by eviemath at 12:05 PM on September 6, 2016 [3 favorites]

Yes, unless you are willing to question the assumptions that underlie all of your evaluations of what constitutes a "best case scenario" or an "acceptable risk", then the algorithm will always simply reify a culture's pre-existing power structures.

Thanks for an amazing post, kliuless.
posted by a fiendish thingy at 12:10 PM on September 6, 2016

I look forward to the time when there is an algorithm built to understand true economic and social value... Social value programmes that measure the value of an individual to society based on their total contribution to the other people within their ecosystems and networks.

this quote has stuck with me from joseph schumpeter (1908) on the concept of social value:

For the system of economic science the main importance of this theory lies in the fact that, if distribution can be described by means of the social marginal utilities of the factors of production, it is not necessary, for that purpose, to enter into a theory of prices. The theory of distribution follows, in this case, directly from the law of social value.

of course there isn't :P

Money IS Politics: "Even if all the passengers on an otherwise sound plane don't think it will take off, it will. But if just enough of the holders of a given currency don't think an otherwise sound monetary reform makes sense, it won't fly. Ideas about money management, then, have a distinct and profound influence in the world of money, regardless of whether or not those ideas are right or wrong."

A society changing tool that can be used for good or other. But don't minimize the potential for total change in the world, think pre-industrial revolution as where we are now.

the theory of prices can be thought of as a kind of dimensionality reduction (for allocative purposes) befitting industrial capitalism to the extent that markets worked 'ok' (to varying degrees and certain people) for the most part -- in past eras -- in improving the material well being of mankind, measured in dollars at least, if not utility. but speaking of implicit bias, as monbiot says: 'What 'the market wants' tends to mean what corporations and their bosses want'.

big data and machine learning, or whatever buzzwords or techniques you want to use, lets you play around with the process of dimensionality reduction, but left to its own devices -- say the political economy of existing/incumbent market structure or 'neoliberalism' -- magnified 'money laundering for bias' seems like what the system will tend to produce.

it doesn't have to be, depending on the data you gather and what you optimize for, you could end up with 'deep dreaming' for the (attention) economy, for better or worse... or using the parlance of harari if we're moving from capitalism to 'dataism' where: "Like capitalism, Dataism too began as a neutral scientific theory, but is now mutating into a religion that claims to determine right and wrong..." then perhaps what we're seeing are 'markets' being subsumed by 'algorithms' writ large.

Do programmes intended to train people specifically for data science/analytics cover the kinds of issues raised in this post?

not sure, but fwiw...
-Becoming a Data Scientist: Podcast Interviews
-Once a Berkeley professor, Hal Varian of Google is now the godfather of the tech industry's in-house economists
-More on what economists do in tech firms, and differences vs. data scientists

oh and (via @FrankPasquale & @mathbabedotorg ;)
-Gamifying the Workplace (Public Books) digital-era version of "scientific management"
-Machine learning is not, by default, fair or just in any meaningful way
-On the importance of training data in machine learning: avoiding the snow huskie problem
-Weapons of Math Destruction: invisible, ubiquitous algorithms are ruining millions of lives: "Models are opinions embedded in mathematics."
-How data is driving inequality

Cathy's blog is really a fun read about math topics clear without dumbing down but she also does a personal service as "Ask Aunt Pythia" on just about any topic of human interaction also from a very smart perspective.

Book Release! (and more :) "Carrie Fisher, who is a SUPREME ROLE MODEL TO ME, has just started an advice column at the Guardian. How exciting is that?! So please, anyone who still mourns the loss of Aunt Pythia, go ahead and take a look, she's just the best."
posted by kliuless at 12:12 PM on September 6, 2016 [3 favorites]

Big data is currently catching on in the developing world, and this has all too many reasons why that's not going to be a good thing for all the poor PoC.
posted by infini at 1:00 PM on September 6, 2016

Nick Harkaway's excellent _The Gone-Away World_ has a terrifying scene built around this with a lovely, clarifying rant at the end.
posted by clew at 7:29 PM on September 6, 2016

Do programmes intended to train people specifically for data science/analytics cover the kinds of issues raised in this post?

Almost never.

Academic programs in statistics generally consider the fact that these procedures are applicable to things outside of academia as an amusing novelty, and thus don't really concern themselves with social justice.

Data science bootcamps are basically the valley's version of DeVry. They have enough time to put some glitter on you and shove you out on stage in front of 30 hiring managers, and not much else.

That said, over the past few years data science has gotten more academic cred and thus you have things like Berkeley's data science program which specifically has an ethics course that is required. This is heartening of course, but only the most genius data scientists are getting an MS from Berkeley, and they are for goddamn sure not working on eg criminal profiling applications, because they can get about 3x as much cash just sitting at Google figuring out how to turn up ad clickthrough by 1%.

It's worth noting that this problem exists in other places as well: you don't hire rando architects to build your skyscrapers because they'll do crazy things like roast cars across the street. But there's no sensible accreditation process for data scientists--most people don't even agree on a definition. And it's very, very hard for a layman to assess the efficacy of an algorithm, which makes the problem of discovering bad or negligent actors even worse.

Ultimately, because the process of doing data science is rarely adversarial and discovering bad stuff is hard, I have no idea what the solution will look like; the article forecasts this problem will be around until Strong AI, and I'm not so sure I disagree.
posted by TypographicalError at 9:40 PM on September 6, 2016 [6 favorites]

I think the business education is co-opting ethics to do away with regulation of businesses with real legal consequences. In the business school that I went to, ethical component of the issue in question was often treated as a means to justify the agenda.
posted by neworder7 at 11:15 PM on September 6, 2016 [1 favorite]

My fear is still that any rating of an individual's worth could just as easily be used against them wrongly, as credit ratings are.

Sounds like a potential Science Fiction dystopia.

Message from master computer: "All human entities with a Social Value Index less than 4 in Sector G-192, please report to an assignment center for reprocessing."
posted by theorique at 3:13 AM on September 7, 2016

But there's no sensible accreditation process for data scientists--most people don't even agree on a definition. And it's very, very hard for a layman to assess the efficacy of an algorithm, which makes the problem of discovering bad or negligent actors even worse.

As you point out, the challenge is thus: being in a position of employment where you can do potentially "bad" things generally pays very well; being an auditor or regulator of such people probably does not pay very well. (Do "algorithm auditor" positions even exist at this time? It seems like a Wild West scenario.)

We see the same thing happen in the financial industry, but much more refined since it's a much older industry: the so-called "revolving door". Lawyers go to the SEC or to self-regulatory organizations for some career experience, and then go to private industry to work as a compliance officer or regulatory officer, where they leverage their government/regulatory experience.

Most of this is benign, and most companies genuinely want to follow the (incredibly complex) rules in a good faith way. But every once in a while you get a Bernie Madoff (whose brother was the head of a major regulatory body).
posted by theorique at 3:24 AM on September 7, 2016 [1 favorite]

The university at which I work offers a certificate in large data analysis, which is generally pursued by students in math, CS, and stats/ actuarial science. As far as I can tell, there's no ethics component. I think they would point out that it's already a ton of work, there are fairly intense pre-reqs that students have to take before they can even get started, and most of those students are already doing majors with hefty and difficult requirements. They would say that adding an extra requirement would make it difficult for students to fit everything in. But yeah: it's mostly math, stats and CS classes, and if they discuss ethics in the capstone courses, there's nothing about it in the course descriptions.

The CS department does not have an ethics requirement. I know that some professors in the department think that they should. Last semester, there was a class offered on ethics and CS that students could take to fulfill a major requirement, although they could also take something else to fulfill that requirement. The syllabus is online, and it had two weeks on algorithmic bias. (It also had a week on the ethics involved in the relationship between grad students and faculty, which is sure as hell not something that anyone ever formally discussed in my social sciences grad program.)
posted by ArbitraryAndCapricious at 5:12 AM on September 7, 2016 [2 favorites]

This is a great, great post. Thank you.

Like others, I've bookmarked to work through it in detail, because wow.
posted by seyirci at 8:09 AM on September 7, 2016

This is a great, great post. Thank you.

Like others, I've bookmarked to work through it in detail. I've been looking towards data science as my next thing for a while, and I'm convinced that if I don't have a good grounding in these aspects as well, I had better stick to analysis problems which involve Hubble data or cancer markers.
posted by seyirci at 8:16 AM on September 7, 2016

Oh, here's a thing I found recently while looking for Tweet-able articles for work:

Public Safety Assessment

The most important decisions made during the pretrial phase pertain to whether a defendant will be detained or released before trial. Many defendants are low-risk individuals who, if released before trial, are highly unlikely to commit other crimes and are likely to return to court. Others present moderate risks and can often be managed in the community through supervision, monitoring, or other interventions. There is, of course, a small group of defendants who should most often be detained because they pose significant risks of committing acts of violence, committing additional crimes, or skipping court. The key, then, is to make sure that we accurately distinguish among the low-, moderate-, and high-risk defendants, and identify those who are at an elevated risk for violence.

Currently, only about 10 percent of courts use evidence-based risk-assessment instruments to help them decide whether to release, supervise, or detain defendants.

Here is a problem - wildly excessive pretrial detention - which could be solved or mitigated with Big Data. And yet it isn't. Because the people who could decide to use Big Data to address it just plain haven't bothered to. For the same reason why we still don't have an official nation-wide directory of use-of-force incidents - it might reveal information which would be inconvenient and require policy changes which are unpopular.
posted by showbiz_liz at 8:37 AM on September 7, 2016 [1 favorite]

In other BD news: Big Data Exposes How Politically Connected Traders Cashed In During the Financial Crisis

Wall Street Execs Allegedly Traded on Insider Info While the World Collapsed in 2008
posted by homunculus at 12:01 PM on September 29, 2016 [2 favorites]

excerpt from peter woit's review:

I saw all kinds of parallels between finance and Big Data. Both industries gobble up the same pool of talent, much of it from elite universities like MIT, Princeton and Stanford. These new hires are ravenous for success and have been focused on external metrics – like SAT scores and college admissions – their entire lives. Whether in finance or tech, the message they’ve received is that they will be rich, they they will run the world...

In both of these industries, the real world, with all its messiness, sits apart. The inclination is to replace people with data trails turning them into more effective shoppers, voters, or workers to optimize some objective... More and more I worried about the separation between technical models and real people, and about the moral repercussions of that separation. If fact, I saw the same pattern emerging that I’d witnessed in finance: a false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops. Those who objected were regarded as nostalgic Luddites.

I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, taking in outrageous fortunes and convincing themselves that they deserved it.

with data comes responsibility - "Reading O'Neil's book this weekend reminded me that there's a one-day training session at the upcoming ASSA annual meetings on ethics, scientific integrity, and responsible leadership in economics. Maybe I will see some of you there."

also btw...
-Cathy O'Neil on EconTalk
-Breaking the Black Box: What Facebook Knows About You [1,2,3]
-How algorithmic decisions lead to discrimination & exclusion [pdf]
posted by kliuless at 12:21 AM on October 6, 2016

« Older "Truth is, we’ve been here all along" | "So be it." Newer »

This thread has been archived and is closed to new comments

MetaFilter

Auditing Algorithms and Algorithmic Auditing
September 6, 2016 12:24 AM Subscribe

Tags

Share

Auditing Algorithms and Algorithmic Auditing September 6, 2016 12:24 AM Subscribe

Tags

Share

Auditing Algorithms and Algorithmic Auditing
September 6, 2016 12:24 AM Subscribe