Opening up the black box
December 20, 2017 10:04 PM Subscribe

New York City's bold, flawed attempt to make algorithms accountable.
posted by Chrysostom (38 comments total) 21 users marked this as a favorite

I don’t really get why they were asking for disclosure of source code, rather than just the algorithm. Surely what they want, in plain terms, is just a clear statement of the rules the system applies? That shouldn’t be too difficult, unless we’re dealing with the new issues that arise from machine learning, which so far as I can tell isn’t the case here? The counter-argument seems not to be ‘that’s impossible in principle’ but ‘we don’t want to tell you the rules because people will game them’, which is surely politically unacceptable.
posted by Segundus at 10:48 PM on December 20, 2017

I think the demand for source code more or less boils down to "Does this program do what it's supposed to do?"
posted by runcifex at 10:55 PM on December 20, 2017 [7 favorites]

It is a really interesting - and increasingly important - collision between technology and democracy. My guess is that at least some jurisdictions may try to mandate openness of algorithm rules in the same way that they mandate the right to access publicly held data.

In many cases the algorithm - together with a platform to test it by feeding it data - would be a great step forward towards transparency. However, there are a number of important algorithms in machine learning where the nature of what the computer is doing is unknown even to the designers. And sometimes that can be used as an excuse by designers to distance themselves from being accountable to what such a program does. Trying to make that sort of thing explicit is a much harder technical problem (I'm not sure whether it is impossible however).

As Segundus says - I don't think that making an algorithm open and making source code open are the same thing.
posted by rongorongo at 10:55 PM on December 20, 2017 [1 favorite]

I think the demand for source code more or less boils down to "Does this program do what it's supposed to do?"

...and nothing else?
posted by rokusan at 11:02 PM on December 20, 2017

If seeing the algorithm allows people to game it it's a shit algorithm.
posted by save alive nothing that breatheth at 11:21 PM on December 20, 2017 [11 favorites]

The high-level description cannot be trusted without auditable and executable source code.

But we’re basically talking about politicians here. What they need is the rules, in layman’s terms. They test that against data about the real-world decisions that actually get made.

My mind baulks at imagining what it is like when politicians have a legal argument over the correct interpretation of source code (though I dare say that realm of Hell is not as fresh as I would like to suppose).
posted by Segundus at 11:22 PM on December 20, 2017 [1 favorite]

There's good reason to think that even the code and algorithm aren't enough unless you have a sufficiently-sophisticated understanding of the algorithm: We need to talk about mathematical backdoors in encryption algorithms

And I even that's not necessarily enough, if you're potentially subject to attack: you need to understand the hardware your code is running on. It's turtles all the way down.
posted by Joe in Australia at 12:45 AM on December 21, 2017 [1 favorite]

There are also those, not infrequent, lines of code that say " just select me something randomly" - but which don't end up actually doing so. Another tier of turtles there.
posted by rongorongo at 12:52 AM on December 21, 2017

I'm a little surprised there haven't already been lawsuits about this. I mean, couldn't someone sue based on an algorithm-derived decision and then at least make an argument that discovery would mean they get to see what happened in the black box?

On Equal Protection of the Law terms, "PROVE that your software isn't biased against me." seems perfectly legitimate. Remember, there is no security in obscurity.

Public Interest concerns should override Corporate concerns. Otherwise why do The People create these Artificial Legal Entities if they don't benefit them in material ways just in their existence?
posted by mikelieman at 1:16 AM on December 21, 2017 [8 favorites]

I'm sure it's not this easy in reality, but I do like the idea of making working code public and encouraging outsiders to run test data on it to see if they agree with the results.

Someone particularly interested in women's issues, for example, could submit a lot of test data to the program and see what comes out. If the output made it look like women were being unfairly treated, the investigator could dig in and try to find the source of that imbalance, or at least take the input and output data alone as evidence that the program is unfair to women.

Have some public advocate take care of running the code for everybody. Let other groups add test data and see the output.
posted by pracowity at 2:19 AM on December 21, 2017 [2 favorites]

I'm sure it's not this easy in reality, but I do like the idea of making working code public and encouraging outsiders to run test data on it to see if they agree with the results.

I'm finishing up a gig with a company that for some reason insists the testers never see the source code, so everything is "banging on the black box to deduce the business rules and expected behaviour", which is a waste of time and resources when the developers should have documented this stuff during analysis before development.

"Ad-hoc" processes are the devil.
posted by mikelieman at 3:57 AM on December 21, 2017 [1 favorite]

Not to abuse the edit. It's better than nothing, but it's insulting that we'd have to do that, given the rules exist in the specs.
posted by mikelieman at 3:58 AM on December 21, 2017

I'm one of the people who testified at that hearing in October on Int. 1696. Some links people might find useful if they want to read up more on the bill:

The text of the bill.
Councilmember Vacca citing/shouting out to danah boyd, Kate Crawford, and Cathy O'Neil as people whose work has shaped this legislation.
Video from the October 16th hearing is a little over two hours long. The "attachments" include PDFs for Hearing Testimony (the pre-written testimony submitted by some witnesses) and the hearing transcript. One attendee live-tweeted practically the whole hearing (sometimes the threading broke a bit) and another shared rough notes as a GitHub gist. Individual pieces of testimony came from the New York Civil Liberties Union, The Brennan Center for Justice, Legal Aid Society, BetaNYC, Princeton's Center for Information Technology Policy, Brooklyn Defender Services, and various other institutions, and some people spoke just as individuals.
A summary of the hearing is in Civicist.
This bill was supposed to be one of several getting a final pre-signing hearing on Monday the 18th, but people who were there tell me that it was left out because time ran out.
ACLU blogged about the passage of the bill.

posted by brainwane at 4:38 AM on December 21, 2017 [23 favorites]

I'm actually a little bit confused. From my scan of the article, the bill doesn't seem to address training or validation data, which, assuming a correct implementation is of far more impact than the actual source code of the decision making system.

I mean I guess it sort of depends on what class of effects you're concerned with. If you care about whether or not the decision system is making decisions correctly to the desired specifications, then perhaps data is not necessary. On the other hand, if you care about whether the decision system has unforeseen discriminatory effects, then data is absolutely necessary.
posted by phack at 4:59 AM on December 21, 2017 [5 favorites]

Oh I should have also linked to the previous version of the bill. On Legistar, if you click the dropdown menu for "version" and switch from A to * and then click the Text tab below the bill metadata (switching away from the History view), you get the bill text that Councilmember Vacca introduced in August and that we discussed at the hearing in October. The amended version was introduced in early December.
The original bill said:

Section 23-502 of the administrative code of the city of New York is amended to add a new subdivision g to read as follows:

g. Each agency that uses, for the purposes of targeting services to persons, imposing penalties upon persons or policing, an algorithm or any other method of automated processing system of data shall:

1. Publish on such agency’s website, the source code of such system; and

2. Permit a user to (i) submit data into such system for self-testing and (ii) receive the results of having such data processed by such system.

As Dr. Powles (the author of the New Yorker article) pointed out in her testimony, the placing of that change would create a loophole. If you want to see the part of NYC's administrative code it would alter, see this page for instructions on clicking through this interface to get to Title 23 (Communications), Chapter 5 (Accessibility to Public Data Sets). The original bill 1696 would have amended the existing open data laws, and the definitions in 23-501 already have some "well this doesn't apply to....." provisions.

b. "Data" means final versions of statistical or factual information
(1) in alphanumeric form reflected in a list, table, graph, chart or
other non-narrative form, that can be digitally transmitted or
processed; and (2) regularly created or maintained by or on behalf of
and owned by an agency that records a measurement, transaction, or
determination related to the mission of an agency. Such term shall not
include information provided to an agency by other governmental
entities, nor shall it include image files, such as designs, drawings,
maps, photos, or scanned copies of original documents, provided that it
shall include statistical or factual information about such image files
and shall include geographic information system data.

g. "Public data set" means a comprehensive collection of interrelated
data that is available for inspection by the public in accordance with
any provision of law and is maintained on a computer system by, or on
behalf of, an agency. Such term shall not include:
(1) any portion of such data set to which an agency may deny access
pursuant to the public officers law or any other provision of a federal
or state law, rule or regulation or local law;
(2) any data set that contains a significant amount of data to which
an agency may deny access pursuant to the public officers law or any
other provision of a federal or state law, rule or regulation or local
law and where removing such data would impose undue financial or
administrative burden;
(3) data that reflects the internal deliberative process of an agency
or agencies, including but not limited to negotiating positions, future
procurements, or pending or reasonably anticipated legal or
administrative proceedings;
(4) data stored on an agency-owned personal computing device, or data
stored on a portion of a network that has been exclusively assigned to a
single agency employee or a single agency owned or controlled computing
device;
(5) materials subject to copyright, patent, trademark, confidentiality
agreements or trade secret protection;
(6) proprietary applications, computer code, software, operating
systems or similar materials; or
(7) employment records, internal employee-related directories or
lists, and facilities data, information technology, internal
service-desk and other data related to internal agency administration.

posted by brainwane at 5:15 AM on December 21, 2017

This part:

It would also be required to simulate the algorithm’s real-world performance using data submitted by New Yorkers.

It would be interesting to set up a sort of Test-driven Development model where there's a set of tests that an algorithm has to pass before being allowed to be put to use, things like a sentencing program being given a list of people of different races and income levels with similar crimes and then verifying that it's not prejudiced against the poorer + darker ones. This way, no matter what's in the black box, you could make sure the results aren't biased with an actual score, showing 0 correlation between race + income and length of sentences.

The actual data used would have to be random, so as to not make it too easy to game the system. At least no easier than it is now.
posted by signal at 6:00 AM on December 21, 2017 [1 favorite]

It's a great idea. Software is so gnarly though that I'm guessing that access to the source code is just the start. As others have pointed out, does it do what it says it does? If it doesn't, why not? Do we understand how it produces the results it does? Can we actually dig into where the data is being transformed? Seems like a very good idea in principle but tricky to implement - not that that should put off anyone from trying.
I was wondering how this would affect the predictive policing systems.
posted by carter at 6:06 AM on December 21, 2017

btw is court sentencing data also city data? I guess so?
posted by carter at 6:08 AM on December 21, 2017

But we’re basically talking about politicians here. What they need is the rules, in layman’s terms. They test that against data about the real-world decisions that actually get made.

No, we're not. This is about making these programs available to the public—which includes reporters, programmers, and researchers. They need and are equipped to analyze the actual source code, because I promise you that for any non-trivial piece of software, everyone involved, from users to product managers to programmers, has a differing mental model of the "rules in layman's terms," and every one of those models will have significant gaps or errors relative to the ground truth of what the software actually does. That's just the nature of software.
posted by enn at 6:20 AM on December 21, 2017 [8 favorites]

Both source code and a high-level description should be required. The high-level description cannot be trusted without auditable and executable source code. Bugs are endemic in software, and usually boil down to a disparity between the high-level intent and the actual behavior.

They will actually need the entire system not merely source code. Uber's software suite included programs specifically designed to fool regulators with databases of the IPs of potential auditors.
posted by srboisvert at 7:51 AM on December 21, 2017 [2 favorites]

I don’t really get why they were asking for disclosure of source code, rather than just the algorithm.

It may interest you to learn that corporations have been known to lie to regulators.
posted by praemunire at 8:27 AM on December 21, 2017 [5 favorites]

Not just Uber - we already forget Volvo?
posted by PMdixon at 8:34 AM on December 21, 2017

There are some pretty interesting examples of what we in my company call "Validated" or GxP software; it's useful in the Pharmaceutical context, where you have systems that will directly impact the health of patients, and the stakes are high- screwups can cause injury or worse.

The FDA doesn't maintain escrow or general access to the source code, but they do have the ability to show up unannounced at any time and audit the project- the deliverables, the test plan, the code base, adherence to the SDLC, the signoffs, etc. Mostly this boils down to ensuring that good controls are in place, that systems are adequately monitored, supervised, qualified, and verified prior to being put into production. As more software as a medical device (SAMD) is produced, these activities become ever more important.

The regulatory authority seems to be mostly derived from their authority to perform unannounced audits on manufacturing lines; it's been extended into the command and control software systems that manage manufacturing, but also those that monitor patient welfare, in the case of a Risk Education and Mitigation System (REMS).

I could easily see a world where this type of regulatory authority extends into other areas- while it would be great to see municipal software generally open sourced, providing a set of trained auditors may also be a way to mitigate the risks posed by closed-source code. The problem is that GxP software is roughly 3x as hard to develop, and quite a bit more costly, as you need a much more rigorous set of process cops to manage and maintain the artifacts developed. It's essential for systems that directly deal with human health, but may be harder to affordably extend into other areas.
posted by jenkinsEar at 9:29 AM on December 21, 2017 [2 favorites]

Regulartory bodies are all well and good, but giving researchers and the actual public access to the code and the ability to run it on any dataset to me is a freedom and security beyond an agency that may be blind to the biases endemic in an industry (e.g., staffed by former industry folks).
posted by dame at 9:54 AM on December 21, 2017 [4 favorites]

Regulartory bodies are all well and good, but giving researchers and the actual public access to the code and the ability to run it on any dataset to me is a freedom and security beyond an agency that may be blind to the biases endemic in an industry (e.g., staffed by former industry folks).

I certainly agree with that.
posted by praemunire at 11:24 AM on December 21, 2017 [1 favorite]

Not just Uber - we already forget Volvo?

Volkswagen?
posted by pracowity at 11:25 AM on December 21, 2017

At the risk of repeating myself, there are two questions:

What do they say it does?
What does it actually do?

Neither question requires any audit of the code or any technical analysis. The political problem, if I’ve understood correctly (far from guaranteed) is that the providers are essentially refusing to say what it does in adequate detail.

Now let the uncontrollable geek salivation resume.
posted by Segundus at 11:45 AM on December 21, 2017

"What do they say it does?" doesn't even matter compared to "What does it actually do?" Feed it good data and look at the results. Base your approval or disapproval on what it does, not on what some bastard claims/thinks/pretends/hopes it does. That should work for any sort of program, including neural nets and anything else that might be difficult to otherwise analyze.
posted by pracowity at 11:59 AM on December 21, 2017 [1 favorite]

Neither question requires any audit of the code or any technical analysis.

Unless, as with the multiple real-world examples people have already cited in this thread, the implementation details are designed to detect when the algorithm is being run in a regulatory test and change its behaviour accordingly. You're being incredibly naive here.
posted by tobascodagama at 12:05 PM on December 21, 2017 [4 favorites]

tl;dr it is sufficiently easy for most code to detect regulatory tests in practice that adversarial black box testing is arguably worse than no testing
posted by PMdixon at 1:42 PM on December 21, 2017 [4 favorites]

What makes you think I was talking about a special regulatory test? I mean ‘what does it do’ in real world operation. Naive, tu massively quoque.

Do some of you have a cognitive issue with moving outside the box where code security and efficiency are considered?

At least I know now why they’re asking for source code. They went to a software guy and asked him what they needed in order to understand what rules a system was applying. His eyes widened, he began breathing heavily, and he said:

“Make them give me the source code. All the source code. To everything! Bwaha ha!”
posted by Segundus at 1:47 PM on December 21, 2017

That baggage is not going to fit into the overhead compartment.

Security, in the context of these types of applications, is a pretty damn narrow window of securing the input or DB query. And the code for that is pretty well known. If there is a value hard-coded into the source which if known would make the code insecure, I'm thinking the code is insecure by definition.
posted by maxwelton at 4:03 PM on December 21, 2017 [2 favorites]

I mean ‘what does it do’ in real world operation.

Please tell me what 'real world operation' is.
posted by PMdixon at 4:05 PM on December 21, 2017 [1 favorite]

Even more importantly, please tell me who is going to determine what 'real world operation' is and based on what.
posted by PMdixon at 4:11 PM on December 21, 2017 [1 favorite]

What makes you think I was talking about a special regulatory test? I mean ‘what does it do’ in real world operation.

The de facto truth is that some of the runs of the program will be part of the set belonging to the regulator test, and some will be as you call them 'real world' (or what I take you to mean by that). The problem is that the former set is being used as a proxy for the latter vis-a-vis deciding if the software is 'correct' (for some definition of correct that boils down to 'passes the regulatory sniff test'), and it's possible to make the system behave differently for the regulator case than it does in the real world. This is what VW got into major hot water for doing just this past year or so, with respect to emissions testing.

This is why source code is the ultimate arbiter of 'what does this software do' and not 'what do we hope this software does'. Any description of the source code that is simpler to understand will necessarily leave out some of the true behaviors. A malicious actor will attempt to comport the elided behaviors in such as a way that it works to their advantage.
posted by axiom at 9:31 PM on December 21, 2017 [2 favorites]

Do some of you have a cognitive issue with moving outside the box where code security and efficiency are considered?

Can you explain what you're asking here?
posted by the agents of KAOS at 1:05 AM on December 22, 2017

« Older Billiards in Movies / TV | “He told me they are just ‘things’” Newer »

This thread has been archived and is closed to new comments

MetaFilter

Opening up the black box
December 20, 2017 10:04 PM Subscribe

Tags

Share

Opening up the black box December 20, 2017 10:04 PM Subscribe

Tags

Share

Opening up the black box
December 20, 2017 10:04 PM Subscribe