California v. Johnson
September 18, 2017 12:43 PM   Subscribe

Kern County got a $200,000+ grant and started using closed-source software to perform a new kind of DNA testing for criminal forensics. Now, the principle at stake in California v. Johnson (California's 5th district court of appeals): does due process require that the defendant be able to examine the evidence used to convict them, which includes auditing forensics software to check for bugs? The American Civil Liberties Union and the Electronic Frontier Foundation, among others, have filed amicus curiae briefs. posted by brainwane (28 comments total) 36 users marked this as a favorite
 
To help a jury understand, the probability model and principals of DNA testing would be explained to them by attorneys on the case or in testimony. Jurors would then be shown the likelihood of whether the DNA found matches or did not.

It shouldn't be hard at all to find twelve random people who have both a rigorous and intuitive understanding of probability to serve on a jury.
posted by peeedro at 1:01 PM on September 18, 2017 [35 favorites]


Maybe we can get Theranos in to testify, too!
posted by praemunire at 1:05 PM on September 18, 2017 [4 favorites]


The algorithm is infallible, do not question to algorithm.

(The jury doesn't need to be able to understand the code. But a third party auditor should be able to independently verify that the company's code does what the company says it does.)
posted by tobascodagama at 1:28 PM on September 18, 2017 [6 favorites]


Well, it would be two expert witnesses, one that says it does work and one that says it doesn't. Then the jurors look at the defendant and flip a racist coin.
posted by Stonestock Relentless at 1:31 PM on September 18, 2017 [38 favorites]


Similar ground has already been covered in Minnesota. [ArsTechnica] In that case, it was the source code for a breathalyzer device.
posted by Pogo_Fuzzybutt at 1:37 PM on September 18, 2017 [5 favorites]


Oh interesting. Hope the good guys win this one.
posted by latkes at 1:57 PM on September 18, 2017 [1 favorite]


New York City's forensic lab is also under controversy for its in-house Forensic Statistical Tool. . Bummer that the FDA doesn't seem to be continuing its efforts to standardize high risk genetic testing; not that this was necessarily going to cover forensics--their last publication on forensic studies seems to be from 2013.
posted by beaning at 2:07 PM on September 18, 2017


It shouldn't be hard at all to find twelve random people who have both a rigorous and intuitive understanding of probability to serve on a jury.

Emphasis on intuitive. I recall reading a case where the defense got slapped down for attempting to explain Bayesian logic to a jury.
posted by Joe in Australia at 2:16 PM on September 18, 2017 [3 favorites]


It shouldn't be hard at all to find twelve random people who have both a rigorous and intuitive understanding of probability to serve on a jury.

I was an alternate juror on a case involving child molestation which featured lots of discussion of DNA evidence. The defense lawyer was keen to eliminate engineers and other sciencey types during voir dire. At first I thought this was a ploy to stack the jury with innumerate clods (such as me). After talking it over with another defense attorney online, I've come around to the idea that the science and engineer types would probably not have based their analysis of the evidence on the material presented to them by expert witnesses-- they would have trusted their own training and experience, which probably has not undergone rigorous cross examination. Worse, other members of the panel might defer to the expert jurors during their own analysis rather than the expert witnesses examined in the court.

It's not a clear cut problem.
posted by notyou at 2:29 PM on September 18, 2017 [16 favorites]


Given that the context here includes examples from recent memory of state and national crime labs behaving in scandalously biased fashions, it does seem like the defense ought to be able to examine and question the methods used to arrive at any supposed determination of fact.
posted by Nerd of the North at 2:30 PM on September 18, 2017 [10 favorites]


Actually let me take some of what I said on regulations back. The President’s Council of Advisors on Science and Technology (PCAST) did issue a recommendation in September 2017 to strengthen forensic science and promote its rigorous use in the courtroom. However the PCAST charter is due to expire and while Trump has said he will renew it, it is uncertain who the council would sit or what its funding would be.
posted by beaning at 2:35 PM on September 18, 2017


Minor nitpick: The case is captioned People v. Johnson.
posted by mikeand1 at 3:15 PM on September 18, 2017


mikeand1, thanks -- I saw EFF and ACLU shortened The People of the State of California, Plaintiff and Respondent, v. Billy Ray Johnson, Jr., Defendant and Appellant to California v. Johnson in their titles and tags, and the official case caption is The People v. Johnson. Is it really common to have the activist articles, headlines and similar materials for the public labelled in the less correct manner, with the state as plaintiff rather than using "The People"?

Separately: I see I missed it when it happened, but the Supreme Court denied writ of certiorari for Loomis v. Wisconsin, which would have given it "the opportunity to rule on whether it violates due process to sentence someone based on a risk-assessment instrument whose workings are protected as a trade secret".
posted by brainwane at 3:30 PM on September 18, 2017


Well, it would be two expert witnesses, one that says it does work and one that says it doesn't. Then the jurors look at the defendant and flip a racist coin.

Well, no. In most states and in federal cases, the admissibility of expert testimony is determined by the judge. If the testimony is unable to meet the standard (sufficient facts, reliable principles, accurate application of principles to facts; there are various factors that may be considered in determining whether the standard is met), then the witness should never testify. It's hard to see how the witness could establish that he was applying "reliable principles" without identifying them. Some of the (permissive) factors to be considered include peer review, error rate, and standards of control...again, hard to imagine a secret technology performing well with those.

You do see "dueling experts" before juries, but that's when the judge has decided that the evidence clears the threshold just described. (And because it's relatively uncommon for criminal, as opposed to civil, defendants to mount this type of challenge.)
posted by praemunire at 3:35 PM on September 18, 2017 [2 favorites]


Publicly funded software should be publicly reviewable.
posted by blue_beetle at 3:42 PM on September 18, 2017 [17 favorites]


The conflict between closed-source proprietary software and legal proceedings is tough. I'd be more interested in seeing a documented suite of test cases and certified results of said tests. Seems like a reasonable compromise.
posted by Ickster at 4:08 PM on September 18, 2017 [1 favorite]


Given that the context here includes examples from recent memory of state and national crime labs behaving in scandalously biased fashions, it does seem like the defense ought to be able to examine and question the methods used to arrive at any supposed determination of fact.

Part of the problem is that TrueAllele, the software in question, does not make a determination of fact, it makes a probabilistic analysis. Like others have said, it's up to expert witnesses to explain the testing methodology and statistical analysis and offer an interpretation to the court. But how the analysis software works is a trade secret, you have to take their word that the math inside their black box is spitting out the correct answer.

Quick background, with traditional DNA analysis the long DNA strands are broken down into shorter segments then amplified. They have to be amplified because they're working with minuscule quantities like a few skin cells left on a surface from touching it. Amplification is like ripping all the pages out of a magazine so you can xerox lots of copies quickly. Once you have an amplified sample, you can look for the presence of specific markers to compare against the DNA of your suspect. If the sample has markers A, B, D, and F but no C, E, G, or H and the suspect's DNA has the same pattern then you have a match.

TrueAllele is used when the forensic evidence includes a mixture of multiple people's DNA. Amplification of a mixed sample is like ripping all the pages out of multiple magazines, making copies, and mixing them together together in a pile. You're going to find the mixed sample has many more markers in varying concentrations but you can't recreate any specific DNA pattern to compare against DNA of your suspect. Your sample will have all markers A thru H in differing amounts, but traditional analysis can't unmix the soup. Once the pages are ripped out of the magazines and copied and all mixed up, there hasn't been a way to put those magazines back together accurately to make a traditional comparison.

TrueAllele claims they have the special software method to recreate the original patterns from messy data of multiple samples. They claim that can take that mixed sample with markers A thru H and determine that person 1 has A, E, and F; person 2 has C, D, and G; and person 3 has A, B, D, and F. Person 3 can be shown to be a statical match to the aforementioned suspect, despite the DNA soup that had previously been too complex to analyze. They are doing this with mixed, partial, and degraded DNA samples.

Take this example:
Near the scene of the killings, investigators found a black bandanna that a witness said was worn by the shooter. They sent it to the county crime lab to see if it bore the DNA of their top suspect, Mr. Robinson. The lab found a mixture of DNA from at least three people on the bandanna and deemed it too complex to analyze, according to the January 2014 lab report.

But TrueAllele concluded Mr. Robinson’s DNA was on the garment. The report on the software’s findings described the match as “5.7 billion times more probable than a coincidental match to an unrelated black person.”
Or this case:
A private company called Sorenson Forensics, testing vaginal swabs from the victim, concluded that the frequency in the profile occurrence in the general population was one in approximately 10,000 for African Americans. The same sample, when examined by Cybergenetics at the company's Pittsburgh lab, concluded that the DNA match between the vaginal sperm sample and Chubbs is "1.62 quintillion times more probable than a coincidental match to an unrelated Black person," according to court records.
How did their software improve the confidence by 1.62 quintillion times? How accurate are their statistical models? Nobody knows, it's a trade secret. Defense experts are unable to check their math.
posted by peeedro at 5:07 PM on September 18, 2017 [15 favorites]


Also wanted to mention that given the racial disparity of the criminal justice system it's probably worth asking if their models have any racial bias baked in. They claim their software interprets DNA using hundreds of variables that are tuned through statistical sampling and machine learning. As forensic databases mirror the racial disparities in arrest practices, African American and Latino groups are overrepresented, so their models may well perpetuate existing patterns of bias.
posted by peeedro at 5:49 PM on September 18, 2017 [8 favorites]


I believe that the right to face your accuser applies here. Not being able to see the source code is just the worst slippery slope. As Cathy O'Neil said in her awesome TED talk last year https://www.ted.com/talks/cathy_o_neil_the_era_of_blind_faith_in_big_data_must_end/transcript?language=en
An algorithm is simply an opinion encased in math. Slavish trusting of the algorithm is bad for humanity.
posted by asavage at 6:04 PM on September 18, 2017 [8 favorites]


Is it really common to have the activist articles, headlines and similar materials for the public labelled in the less correct manner, with the state as plaintiff rather than using "The People"?


It might well be deliberate. A lot of folks who are sympathetic to defendants' rights bristle at prosecutors calling themselves "The People." Defense attorneys point out that they represent people too, and once in a while a defense attorney will make a motion to prohibit prosecutors from calling themselves "The People."

It's also common for prosecutors in closing arguments to make pompous statements about how they represent "The People" and why this means they have no axe to grind, or why the jury shouldn't hold it against the prosecution that their star witness was a detestable snitch, etc.
posted by mikeand1 at 7:02 PM on September 18, 2017 [2 favorites]


A private company called Sorenson Forensics, testing vaginal swabs from the victim, concluded that the frequency in the profile occurrence in the general population was one in approximately 10,000 for African Americans. The same sample, when examined by Cybergenetics at the company's Pittsburgh lab, concluded that the DNA match between the vaginal sperm sample and Chubbs is "1.62 quintillion times more probable than a coincidental match to an unrelated Black person," according to court records.
How did their software improve the confidence by 1.62 quintillion times? How accurate are their statistical models? Nobody knows, it's a trade secret. Defense experts are unable to check their math.
Either I don't understand what they are claiming or they don't (or very possibly both.) But isn't claiming such a level of accuracy ludicrous on its face? Assuming the sample is human DNA and there are something like 7 billion humans on earth, I cannot see how the chance of a coincidental match to any specific person could be worse than on the order of one in 7 billion. How, then, do you improve upon it 1.62 quintillion times (or 7 billion approximately 230 million times over..)?
posted by Nerd of the North at 9:12 PM on September 18, 2017 [4 favorites]


How did their software improve the confidence by 1.62 quintillion times? How accurate are their statistical models? Nobody knows, it's a trade secret. Defense experts are unable to check their math.

You can increase confidence like that by either: having a different dataset (or using the one you have differently), comparing it to a different overall set of "black person DNA", making a lot more (pointless) runs of your data (in some models) or royally fucking up your math. I assume they are not royally and obviously fucking up their math and that all the data entered is roughly the same (? maybe? they aren't extracting their own are they?) so I think they are either comparing it to a much larger overall population or the are interpolating incomplete data. I hope they are not extrapolating. I guess they could also be doing something totally genius but if someone brought this to me, I would want to see this work. This isn't something that happens in stats, where someone just comes up with a model that is 100 million times more able to discern significance than all the others. And if someone did they'd be in all the journals.

This should definitely not be close source. It sounds like utter bullshit and I'd be surprised if it's not some combination of fancy interpolation of incomplete data points that shouldn't be used combined with running the model a squillon times and curating the "unrelated" dataset to make the results looks more impressive.

Or maybe they just keep taking the log till it's a straight line :-|
posted by fshgrl at 10:05 PM on September 18, 2017 [3 favorites]


Peedro: As forensic databases mirror the racial disparities in arrest practices, African American and Latino groups are overrepresented, so their models may well perpetuate existing patterns of bias.

This point is massively important. We can estimate from racial disparities in arrest and conviction rates that the CODIS database (the largest federal genetic repository) is around 40% black and 60% white. This can seriously inflate the prior probability of a black sample matching spuriously. There are additional public and private 'shadow' DNA databases used in forensics where sample composition is even less well understood. Which of these are contributing to this product's likelihood estimations? What is their normalization approach?

Many features of the US legal (and electoral) system were designed - imperfectly - to prioritize consent, transparency, and symmetry. Even if a more advanced probabilistic system can be proven to more accurate, the fact that there is an inbuilt information asymmetry preventing the accused from effectively interrogating his 'accuser' and contributing to his own defense undermines the principles upon which our legal edifice is built. This system may have its uses, but closed-source software has no place in the court, even if third-party experts are brought in to examine it. Third-party experts are brought in to intimidate just as often as they are used to enlighten juries with their expertise. Any such system should be a known quantity whose attributes have been thoroughly discussed in the public domain so that there is a reasonably broad base of expertise (statisticians, data scientists, geneticists) available to interested non-specialist members of the public.
posted by Svejk at 11:42 PM on September 18, 2017 [2 favorites]


The question goes beyond the need to check the software for 'bugs' in the code: presumably the estimation of the posterior probability includes the use of a population-based likelihood - the data used to compose this likelihood should be public and open to interrogation.

Additionally, the likelihood could theoretically change with time, as the composition of the population sample changes (e.g. if more precise local population estimates are used in some cases, or if the contributing DNA databases are altered in any way). So ideally it should be continuously transparent to the public.

At present, I am not even certain that it is possible to precisely interrogate the demographics of many of these databases, as relevant metadata is either not recorded or not publicly available.
posted by Svejk at 12:00 AM on September 19, 2017 [3 favorites]


Either I don't understand what they are claiming or they don't (or very possibly both.) But isn't claiming such a level of accuracy ludicrous on its face? Assuming the sample is human DNA and there are something like 7 billion humans on earth, I cannot see how the chance of a coincidental match to any specific person could be worse than on the order of one in 7 billion. How, then, do you improve upon it 1.62 quintillion times (or 7 billion approximately 230 million times over..)?

DNA profiling works by looking at repetitive segments on non-coding regions of DNA (the incorrectly named "junk DNA"). In these repetitive segments, harmless mutations occur every thousand generations or so in which a portion of the repetitive segment can duplicated or deleted through molecular oopsies. Through statistical sampling we know the distribution of the number of repeats that are shared by the population; for example in location A, 5% of the population have four repeats, 12% have five repeats, 2% have six repeats and so on.

The FBI database (CODIS) uses a system that was 13 and is now 20 different locations. These locations are assumed to be independently assorted, that is the the number of repeats at location A does not affect the number of repeats at any other location. Because they're independent, the likelihood of a match of a particular combination of these DNA features at the 20 places can be calculated with the product rule for probabilities.

Using my simplified example of the suspect with A, B, D, and F DNA fingerprint, the probability of finding a match is calculated by multiplying the probability of type A times the probability of type B times the probability of type D times the probability of type F. This multiplication of twenty (or more as Svejk points out above that outside DNA databases that use more locations can be included) can quickly get you to these one in a zillion odds. And even with these tiny odds, random matches occur at surprising rates.
posted by peeedro at 2:59 AM on September 19, 2017 [3 favorites]


"Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System" is an analysis by lawyer Rebecca Wexler (author of "When a Computer Program Keeps You in Jail", also interviewed on The Takeaway on June 15, 2017), who's been keeping tabs on this issue for a while. Her take (my oversimplified explanation) is that we can use existing court mechanisms to let the defense see stuff that does not get put into the public record.
posted by brainwane at 6:01 AM on September 19, 2017 [3 favorites]


In response to the New York Times piece "When a Computer Program Keeps You in Jail", Mark Perlin, the creator of TrueAllele and the chief scientist of the company behind it (based in Pittsburgh), wrote: "Defense lawyers try to keep unfavorable DNA evidence out of court (source code is one ruse)."

Oh, also, Perlin is the president of a new Pittsburgh-based nonprofit that works on forensic science in the justice system: Justice Through Science (EIN 81-5429003). The Cybergenetics company newsletter mentioned the nonprofit but didn't say "and our chief scientist is its president, and the person who registered the domain name justicethroughscience.org, and who registered the trademark 'Justice Through Science'". When Perlin responded to the federal public comment period on advancing forensic science, he did so as the President of Justice Through Science, not as the creator of TrueAllele and chief scientist of Cybergenetics.

Most or all of the educational offerings are talks by Perlin. (Justice Through Science's page of print resources, interestingly, links to two pieces that specifically mention arguments for Cybergenetics to let others audit its source code.)

On November 3rd, Justice Through Science is holding a one-day conference in Pittsburgh that counts for Continuing Legal Education credits. As far as I can tell, Perlin's the only speaker who has a programming background. (The first speaker is Cynthia Zimmer, who prosecuted Billy Ray Johnson, Jr. in Kern County and who is running for District Attorney.)

Justice Through Science, located on Shady Ave. in Pittsburgh, is so new that no form 990-N is available to inspect.
posted by brainwane at 6:56 AM on September 19, 2017 [1 favorite]


Is it really common to have the activist articles, headlines and similar materials for the public labelled in the less correct manner, with the state as plaintiff rather than using "The People"?

"The People" could be a city, state, or federal label; calling it CA vs Johnson clarifies what jurisdiction is involved.
posted by ErisLordFreedom at 12:23 PM on September 19, 2017 [1 favorite]


« Older EFF resigns from W3C over their Encrypted Media...   |   It's Fresh Hop Time! Newer »


This thread has been archived and is closed to new comments