" I wanted to reply that maybe her engineers should be scared."
April 1, 2018 10:21 PM   Subscribe

I was nodding my head irl reading this. YES. It's scary how much power we're giving these models when most of the people training and tuning them flat out do not understand them.
posted by potrzebie at 12:01 AM on April 2, 2018 [2 favorites]

I hang around my university's computer science department. I have heard through gossip that one of the burgeoning fields in machine learning involves understanding what's going on with machine learning models, and ways to test correctness— or: now that it works, understanding why it works.

I wish those who seek this information well.
posted by Quackles at 12:42 AM on April 2, 2018 [4 favorites]

I find the email scary because I'd even presume that the manager is somewhat projecting on the engineers: maybe, the engineers aren't scared, they know not to casually mess around with the numbers and manager is coming in under the impression that "testing hunches" and "gaining intuition" is a substitute for doing fundamental research.
posted by polymodus at 12:47 AM on April 2, 2018 [4 favorites]

Quackles—I have a couple friends who work at a company that's been doing AI for a long time (Cycorp), trying to model commonsense knowledge of reality as symbolic-logical assertions that machines can digest. One of their selling points is that they can tell you how a conclusion was reached. It's interesting that we're at that point, especially in light of this recent post.
posted by adamrice at 7:29 AM on April 2, 2018 [1 favorite]

A curious feature of most optimization problems is that the best general purpose solutions rarely lie on the efficient frontier, I.e. perfect maxima. In fact this is what deep net (etc) training, and most of applied statistics, is all about — trading efficiency for robustness and vice versa. It’s also why methods papers are so hard to interpret unless you know all the tricks involved, not just in building the models but also in benchmarking their performance. Intro stats courses (good ones) routinely spend half or more of their time on model diagnostics, leverage, robustness to model assumptions, etc. A single logistic regression or hinge-loss fit can eat up an afternoon. Now convolve and connect 10,000 of them. Is it any wonder that these models can behave pathologically?

Much current research ultimately deals with finding optimally suboptimal models, i.e. trading training/test performance for valid predictions from unseen corpora. This is HARD to get right, and intuition routinely fails.

I’m reminded of the old programming truism from when I worked at IBM: A debugger is no substitute for clear thinking, and clear thinking is no substitute for a debugger. In the case of single-layer statistical models, we have lots of experience with the types of pathologies most often encountered in practice, and good tools (leverage, breakdown points, posterior evaluation by sampling, cross validation, and so forth) to “debug” pathological models. Those of you who worked with Andrew Ng “back in the day” know that the cold start courses he taught emphasized understanding these types of tools before blindly moving on to more complex models.

I don’t believe that the public will accepts more complex models until a baseline understanding of “how to lie with neural nets / GANs / brute force computing” becomes as readily available as “how to lie with statistics” (cf. the fantastic spurious correlation gallery). I could be wrong, but I certainly hope not. Far too much of current AI hype is just cargo cultism.
posted by apathy at 7:59 AM on April 2, 2018 [4 favorites]

I know it's not the main focus (heh) of this, but I had *no idea* that camera lenses were so complex and multistage.
posted by tavella at 8:26 AM on April 2, 2018 [1 favorite]

I took an undergrad AI course in the mid-90s, back before neural nets were the big thing, and even then the professor emphasized the importance of having a system that could explain its decisions. Neural nets are useful but not trustworthy and I hope that the disaster that teaches the public this will be relatively minor.
posted by suetanvil at 8:59 AM on April 2, 2018 [2 favorites]

I know it's not the main focus (heh) of this, but I had *no idea* that camera lenses were so complex and multistage.

This article did a great job of smashing assumptions I didn't even realize I had about how camera lenses work and how neural networks are created and trained (I knew I didn't understand how they "worked"). So thanks for that.
posted by straight at 10:16 AM on April 2, 2018

The camera lens stack looks superficially similar to the neural net block diagram, but the analogy seems poor to me. It's fine to hope that applying machine learning will some day be as straightforward as applying optics, but that doesn't mean it will be.

Each element of the lens stack is doing a job that has a fairly simple, local, probably linearizable description of what it does to an incident wave of a given position and wavelength. A single differential equation or whatever.

A hidden unit in a neural network is more like a small patch of a hologram, which encodes a seemingly random interference pattern, than a small patch of one lens. It's the result of a global procedure that relates the inputs and outputs of the whole network to the training feedback, rather than something that can be broken down to the effect of this one layer on this small patch of input, and optimized as such. Moreover, each element of a neural network has an explicitly nonlinear response. Which might depend on whether some tired person has applied some emotionally charged label to an image that's difficult to interpret.

Also, the problem of explainability isn't specific to neural networks. Even before the revolution in neural networks a few years ago, people were routinely using what amounts to linear regression models with several thousand dimensions / features. And a decision made by such a model isn't necessarily attributable to any one or even just a few of those features. You can compute an estimate of how much each feature is contributing to the decision of a linear model, but that might be a list of a thousand things all contributing mildly to the decision. And once you get into counterfactual scenarios about what if some feature weren't used, there are combinatorically more possible "explanations."

(Even with people, in situations where you ask someone to explain a decision, it's typically in a framework where only certain types of evidence and decision-making are acceptable. If you flash a picture of a dog on a screen for 1/10 of a second, I doubt many people could explain exactly what made them say "dog.")

However, none of that is to take away from the author's worthy goal of providing non-cargo-cult guidance for various decisions in designing neural networks.
posted by mubba at 7:02 PM on April 2, 2018 [3 favorites]

« Older Harpejji? I hardly knew ji!   |   "I think it's about 50-50" Newer »

This thread has been archived and is closed to new comments