Pepsi Deep Blue
November 13, 2015 2:46 PM   Subscribe

TensorFlow. Google has open-sourced their numerical computation library for machine learning applications. (Especially "deep" learning.)

Similar to Torch and Theano, TensorFlow makes it easy to abstractly specify complex neural networks in a high-level language, while providing efficient C++ implementations of numerical operations and conveniences like automatic differentiation.

TensorFlow was released to the public on Monday and seems to have immediately established itself as the new hotness. Some reactions via Wired. The takeaway seems to be that even more people are going to be doing deep learning now.

The website includes tutorials, creating models from the very simple (I guess this is like the 25-year-ago bleeding edge) just about up to the bleeding edge (LSTMs for machine translation).
posted by grobstein (28 comments total) 58 users marked this as a favorite
 
Oh I forgot to say anything about the "flow graph" metaphor. In TensorFlow, you specify the model as an abstract graph of dataflows and computations. Since this is already how people are accustomed to think about neural network models, it seems like a good choice.

I don't really know how this compares to how it works in other state-of-the-art tools.
posted by grobstein at 2:50 PM on November 13, 2015


Ooh. Imagine the twitterbots!
posted by ardgedee at 2:53 PM on November 13, 2015 [2 favorites]


I'm really excited for this release. I still have a few friends inside Google, including one on this project. From what I can tell it's a serious release from Google, they are committed to making this a real product people use, Google included. I like Matt Cutts' comments on the release, particularly noting how TensorFlow is a different strategy from previous tech transfers like MapReduce.

From reading the white paper I think the big chance with TensorFlow is that the implementation scales from small mobile devices to massive distributed clusters. I don't think any of the other ML toolkits out there are that ambitious. Also the underlying model specification language in TensorFlow is quite complex and general purpose. That's a bit of a mixed blessing in that it makes it easier to screw it up in application, but it makes it quite powerful.

I'm particularly impressed with the visualization and debugging tools they've been building. Fernanda Viégas and Martin Wattenberg contributed a graph visualizer, for instance, they are some of the industry's leading data visualization folks

More generally statistical machine learning is at a really exciting moment in the industry right now. Huge companies like Google and Facebook have been doing great applications of it, and the tools and systems to use it are becoming more accessible. It remains to be seen, but if the product is well designed TensorFlow could have a big impact on a lot of companies.
posted by Nelson at 3:34 PM on November 13, 2015 [4 favorites]


I mean, I'm flattered, obviously...
posted by The Tensor at 4:31 PM on November 13, 2015 [16 favorites]


From what I can tell it's a serious release from Google, they are committed to making this a real product people use, Google included.

I'm just going to remind everyone that this is far from making it certain this project will survive, and I'm not even going to mention the glaring obvious example this time. This is Google, and they may at any moment decide to cut ties to this for gnomic purposes of strategy.
posted by JHarris at 5:36 PM on November 13, 2015 [1 favorite]


Huge companies like Google and Facebook have been doing great applications of it [statistical machine learning]

I feel that "great" in the case of these two entities is perhaps not quite the right adjective. "Impressive" and "alarming", yes. YMMV.
posted by five fresh fish at 5:52 PM on November 13, 2015 [5 favorites]


Man even an open source release from Google has the haters out.

It's an open source release. Google can't unring the bell. They could drop support for it I suppose. Or more likely, they could just mismanage the stewardship of the community. But this isn't a service like Google Reader where if they shut it down it disappears; the code is out there. And I think since GoLang, Google has shown it can actually grow a software community effectively.

And sure Google does some alarming things with machine learning. They do some great things too. Google search retrieval is way, way better than it was a few years ago, largely because of applications of machine learning. The voice recognition they do is very good compared to where it was a few years ago. So is machine translation of languages. These are all applications of TensorFlow or its predecessor, the internal-only DistBelief.
posted by Nelson at 6:03 PM on November 13, 2015 [13 favorites]


> This is Google, and they may at any moment decide to cut ties to this for gnomic purposes of strategy.

This bears little relationship to hosted services like Reader, which require Google's infrastructure to sustain, and therefore gives Google the power of life and death over.

Like Nelson says, when Google opens a code project, it can be a living project for as long as there's an active community maintaining it, and that community does not have to be Google. It does not have to be hosted on Google's servers; in fact Google is in the process of shutting down Google Code (a hosted service) and moving their code projects to GitHub (which Google does not own).
posted by ardgedee at 6:11 PM on November 13, 2015 [2 favorites]


So is machine translation of languages.

I had thought Google Translate was still a phrase-based system with handcrafted features, not an end-to-end "deep learning" approach. So it would still benefit from fast linear algebra, but it's not the kind of model this framework is made for.

Has that changed?

Afaik deep learning for machine translation only hit state of the art phrase-based benchmarks 2014 or '15, which is why I called it "bleeding edge."

That aside I obviously agree that the technology is really exciting.
posted by grobstein at 6:21 PM on November 13, 2015


Is there a good primer on ML/DL for a somewhat nerdy person who is not really nerdy, specifically, I'm talking about in this case, me? Like: what class of problems is ML/DL good at? I know about translation and machine vision; what do I need to know about how this works / what this is good for to start imagining other use cases?
posted by wemayfreeze at 7:19 PM on November 13, 2015


A simple example would be: Computer, here is 100 things with 10 data points each. These 10 are fake; These 90 are legit. Analyze the attributes of each set, and figure out what attributes a fake object tends to have. Now here is 1 million objects, sort them into groups of fake and real.
posted by sideshow at 7:36 PM on November 13, 2015 [4 favorites]


Even more contrived. Imaging there is one attribute, "realness". Legit objects happen to have a "realness" score about 90, the fake objects have a score of less than 10. The ML software figures out "hey anything below 50 is fake" because you trained it by sending it a bunch of objects, designated as "fake", with low scores.
posted by sideshow at 7:44 PM on November 13, 2015 [1 favorite]


[puts down my sign saying DOOM for a moment]

I was just making my comment in passing and didn't really want to turn this into yet another lament for Reader, which is why I didn't mention it by name, but I figured the statement should still have been made, as a warning at least. Sometimes warnings are unfounded. I have no special knowledge here.

Google open sourced Wave as their way of washing their hands of it. Their public statements seem to indicate that isn't in the cards for this project, and I hope it isn't. But I think it's healthy to maintain a skeptical stance, considering. May that stance be in error!

Whether I'm a hater or not, you don't have to go so far back to find me saying equally negative things about Apple and Microsoft. If I am a hater, at least I am not a partisan one.
posted by JHarris at 9:25 PM on November 13, 2015 [2 favorites]


You know the classic idea of a computer translating inputs to outputs? Most of the time, of course, a computer does that using a handwritten program. In fact, any of the typical pattern matching tasks you see ML used for could be attempted using a hand written program (and people have tried!)

What ML effectively does is learn the program automatically given the inputs and the expected outputs. That is, they learn a function which approximates the actual process which transformed the example inputs to the associated outputs. Once learned, you can use that approximation of a function to map new inputs to outputs.

Magic? Not really. Because its a (statistical) approximation. It doesnt learn any old function, it learns an how to take an existing function you give it, and parameterize it such that it approximates some unknown function thats produces the output from the input. The canonical example is a function of a line separating two clusters of dots in a 2d graph -- the assumption is that its a line.

If you dont have enough examples, or you adhere too strictly to the examples and the parameterized function is a bad fit to whatever the actual function is you're approximating, you get bad results (eg. a set of dots you can't separate using a line). You can think of the structure of the model as a "cheat", in the sense that you are assuming something about the function producing the data.

One of the reasons Neural nets are so effective (now that we have the computer power to run large ones), I believe, is that its easier to tailor the assumptions in very flexible ways while not changing the way the base algorithm works. That is, by rearranging a NN into a recursive structure, or a flat structure or siloing some of the connections, you are tailoring the learning machine to the domain, but you dont have to go and rewrite all of the learning algorithms to match.. you just hook it all up and (effectively) push a button.

So what is ML good for at this moment? Well, any time you have a lot of examples of a simple decision (yes or no; dog, cat or narwhal; email folders; etc) made based on data, you have no firm idea how the decision is actually made or how to program it, but you want to make the computer make the decison on inputs you havent seen yet. But its not magic.. the more information you have about how the decision is made, and the more you can structure the model based on that, the better the computer will do.

Im leaving out lots of details. If you want all the technicalities, google "machine learning tutorial", there are tons now. If you're doing something simple, like a binary decision, it can actually be kind of turnkey, there are packages like the OP that can take data and create a reasonable model with minimum of fuss. You'll still need to tune it however, and if the task is complex (eg. vison,translation), a simple model isnt going to work.
posted by smidgen at 9:28 PM on November 13, 2015 [4 favorites]




Like: what class of problems is ML/DL good at?

Among other things, machine learning is good for data compression, classification, language modeling and text prediction, and game playing. In all these applications, the history of previous experiences shapes the data model and so helps with deciding what's coming next.
posted by a lungful of dragon at 11:16 PM on November 13, 2015 [2 favorites]


> Google open sourced Wave as their way of washing their hands of it.

Wave was a hosted service as well, for which Google released some of the Wave source code on launch. If anything the problem with Wave as a project has been that even after handing off Wave to the Apache Foundation, Google didn't release enough of the code to allow others to launch Wave servers without significant additional coding effort.

The reasons for Wave's demise are myriad and the topic's a derail. Google's public code, including TensorFlow, are literally unkillable by Google; even if Google closes their GitHub account and deletes their project repositories, anybody with a recent instance of the project can resume it simply by uploading it as a new project.
posted by ardgedee at 2:46 AM on November 14, 2015 [3 favorites]


what class of problems is ML/DL good at?

I don't think anyone knows the answer to that. From what I've learned about ML, it's particularly good at learning patterns in data. For example, we all know that you can take the mean of a bunch of data to characterize something about it, the average value. That average is a very simple form of prediction. The machine learning I've studied is basically much more complex forms of statistical characterization of datasets. If you apply the right learning model correctly, you end up with something that has predictive or generative power. The drawback of statistical machine learning is you don't get explanatory power. You might have a computer that can predict something, but you won't really understand how it predicts it other than the big ball of numbers it's using to do the prediction.

In practice this means you can train a system to keep a car driving in a straight line, or translate Arabic to English, or predict what a user actually meant in a Google search instead of what they typed. (With mixed results; Google is still suggesting [sensorflow] for some searches for [tensorflow]. That was a reasonable suggestion until last week! I assume it will learn the new word soon.)

One reason machine learning is so hot right now is it's gotten way, way easier to work with large ML systems. It turns out that the algorithms work a lot better with more data. Also more complex models can do more. In the old days you were doing well to train a 20 node neural net on 10,000 data samples. Now you can have hundreds of nodes and billions of data samples. TensorFlow seems in part aimed at extending that reach, making it easier for ordinary programs to build a learning system and then distribute it across many computers.

If you're a programmer type comfortable with freshman-level mathematics and want to learn more, Andrew Ng's machine learning course on Coursera is worth a look. It's 10 weeks' of free MOOC covering basic machine learning algorithms and application. Starting with simple linear regression up through building your own neural networks and support vector machines and stuff. I went through it a few months ago and learned a lot, although I did find it a bit frustrating to be writing the fundamental algorithms in Octave/Matlab rather than just applying algorithms like scikit-learn or now TensorFlow. But now I understand what backpropagation does so a system like TensorFlow is a little less mysterious to me now.
posted by Nelson at 7:02 AM on November 14, 2015 [5 favorites]


A simple example would be: Computer, here is 100 things with 10 data points each. These 10 are fake; These 90 are legit. Analyze the attributes of each set, and figure out what attributes a fake object tends to have. Now here is 1 million objects, sort them into groups of fake and real.

Even more contrived. Imaging there is one attribute, "realness". Legit objects happen to have a "realness" score about 90, the fake objects have a score of less than 10. The ML software figures out "hey anything below 50 is fake" because you trained it by sending it a bunch of objects, designated as "fake", with low scores.


Or to put it into more concrete terms, imagine those objects are posts to mefi. ML/DL might be used as a first step analysis in moderation, having been trained on years of human moderation. You could present mods a sorted list of new posts, ordered by predicted sketchiness. Some statistical methods can even tell you why, and how confident it is about the sketch. For example, it might say "I'm 70 percent confident that this post is sketchy, because the poster's account is less than 24 hours old, and contained a single link (and in the training set those are positively correlated factors with deleted posts)." Or it might even find something weirder, that human intuition hadn't yet caught on to.

Technically, a rogue marketing firm could use this same technique to avoid detection, but MeFi admins have more data to feed the ML than is available publicly, and tools might be able to find a few strong signals among that private data.
posted by pwnguin at 11:45 AM on November 14, 2015 [1 favorite]


These are all good answers.

I would put the general theme as follows: many applications of intelligence take the form of similarity judgments. The mods try to decide whether a post is spam based on whether it is relevantly similar to past spam posts. A bank tries to decide whether a prospective debtor is likely to default based on whether they are relevantly similar to past defaulting debtors. A translator tries to find an English word that is relevantly similar to a particular French word.

A lot of the time, we don't have a good theory for what exactly these similarities are. If you know French and English, you can translate a given French word into English pretty well, but you can't write down a set of rules to do the job. You just kinda know it when you see it.

For the first few decades of AI research, this "know it when you see it" knowledge proved impossible to model, and researchers stuck with hand-made rules.

So if you were trying to build a speech recognition system, you would come up with some linguistically motivated rules and program them into the computer. Oh, this kind of change in wave-form means the end of a phoneme, this kind of change means the end of a word, etc., etc. Through expensive trial and error, you come up with a set of rules that kinda works, and that's the state of the art.

Similarly, in traditional statistical modeling, you come up with a bunch of features in the data that you think can explain the behavior you're interested in. So you're trying to explain income, you put in features for age, education, sex. Maybe you think that education has different impact for different-sex workers, so you add in a sex*education interaction term. You try and figure all the possible relevant features, and then you run your analysis and see what you get. If you don't get a good fit, then its back to the drawing board to try and figure out what you're missing.

Deep learning, which starts in earnest in the '90s, gives you a way to model these similarity judgments without having to know what the rules are -- it lets you make a machine that knows it when it sees it.

The mechanism is what's called "feature learning" or "representation learning." You dump the unanalyzed data into the system, and it automatically finds higher-level features of the data that explain the behavior. So you dump unanalyzed face pictures into your model, and it "learns" ways to crunch this data that produce features with higher level meaning -- probably including things like chin height, mouth width, brow height, but also lots of stuff you'd never think of by yourself.

(How does this work? If you've been exposed to logistic regression, a rough explanation of a small artificial neural network goes as follows: first, run a regression of the dependent variable on a bunch of small random values with no interpretation. Then, regress each of those values on the input data. Over a large number of iterations, the initially uninterpreted values converge on functions of the input data that have a high degree of explanatory power for the dependent variable. These are the intermediate representations that are "learned" by the model. More detail.)

What deep learning does, then, is allow us to accurately model relationships for which we have very little theoretical understanding. What is it good for? Right now, it looks like the sky's the limit. We don't have a good scientific theory of what makes a selfie good, but we can train a neural network to tell good from bad selfies. We don't have a good scientific theory of how to translate French into English, but we can train a neural network to do it.

To put a finer point on it, deep learning allows you to substitute data and computing power for theoretical understanding. So when you have a lot of data and computing power, and not much theoretical understanding, you might have a deep learning application.

("Machine learning" generally is a broader class of techniques, many of which are less magical than deep learning seems to be. But representation learning and "know it when I see it" are themes of machine learning generally, too, to a lesser extent.)
posted by grobstein at 12:24 PM on November 14, 2015 [7 favorites]


grobstein: I had thought Google Translate was still a phrase-based system with handcrafted features, not an end-to-end "deep learning" approach.

Without having any inside knowledge of the product, I'd say it almost certainly uses both, especially for "smaller" languages where less human-verified data is available. In any language-related problem, data sparsity is a big problem — any time you apply a tool to real-world data, you immediately encounter phrases you couldn't have anticipated. The strength of machine learning is that when it's done right, it's able to generalise from the data it was trained on in order to deal with novel data in an "intelligent" way. Better ML is able to generalise more elaborate patterns and take more context into account.
posted by shponglespore at 2:26 PM on November 14, 2015


It's a subtle point, but I know Google Translate uses statistical machine translation, which, sure, counts as machine learning. What I'm asking about in my comment is whether it uses hand-made features, similar to Phrasal and Moses, or it instead relies on end-to-end training with feature learning (which would count as "deep learning").

My understanding is that it uses hand-made features.

AFAIK, prior to this year, no one had demonstrated good machine translation with end-to-end training and all state-of-the-art systems used hand-made features. Thang Luong has published a bunch on this with various collaborators and advisors.

That said, though, machine translation systems may start to switch to end-to-end training approaches pretty fast now that it's known to be viable.
posted by grobstein at 2:48 PM on November 14, 2015


I suspect Google's approach to "small" languages with not much data is to pay for bilingual data. (Note also that many language pairs are translated using English as an intermediate.)
posted by grobstein at 2:50 PM on November 14, 2015 [1 favorite]


Tenser, said the Tensor.

I am a marine evolutionary ecologist with some rudimentary programming knowledge. I keep thinking that there is an application for this sort of analysis within my field, but the lack of explanatory power might be a drawback. I'm generally interested in patterns of genetic connectivity among reef fishes. There are all sorts of factors that go into this: ocean currents, behavior, ecology, spawning time & behavior, amount of time the fish spends as a larva floating around, degree to which larval behavior affects how far they go, etc. etc. etc. Overall, it's the "why" that keeps us doing it, but I think this maybe could be very useful for modeling things like marine protected areas. Managers, lawmakers, and fishers are not as concerned with the "why." They usually want to know where to draw the line on the map, or when to close which seasons. If we could do the genetics, train the model, run some predictions, and then do more genetics to validate the model, that could be extremely beneficial in many ways.

Thanks for posting the Coursera link, Nelson! As if I wasn't busy enough. I may just have to give that a go.
posted by deadbilly at 5:13 PM on November 14, 2015 [1 favorite]


Last I heard, Google is still using language specific statistical parsers when they have it... They'll probably use an ensemble approach and then switch completely to pure NN as the state of the art improves.
posted by smidgen at 9:37 AM on November 15, 2015


Er... Parsers and phrase alignment (darn phone)
posted by smidgen at 9:45 AM on November 15, 2015


Pokemon or Big Data?
posted by jeffburdges at 6:36 PM on November 25, 2015 [1 favorite]


Feebas require Swift Swim to get marvel scale!
posted by grobstein at 7:36 PM on November 25, 2015 [1 favorite]


« Older Suffer the Children   |   In Paris, something terrible is happening Newer »


This thread has been archived and is closed to new comments