Datasets over algorithms
April 22, 2016 5:46 AM Subscribe

"Perhaps the most important news of our day is that datasets — not algorithms — might be the key limiting factor to development of human-level artificial intelligence". Alexander Wissner-Gross responding to Edge. Found here, with some links and a table.
posted by signal (39 comments total) 17 users marked this as a favorite

Periodic reminder that Edge.org's founder, John Brockman, is a literary agent. A lot of these folks are just participating here to shill their pop sci books on their particular hobby horses.
posted by leotrotsky at 6:00 AM on April 22, 2016 [9 favorites]

Datasets were key to stuff like genuinely useful translation - the point is that they allowed translation by a different route than human translation, ducking the need to deal with actual meanings.

It seems to follow that better datasets might (quite a big might) similarly enable a good imitation of human-level artificial intelligence without delivering any actual consciousness.

Now that might be pretty good: a robot that performs very like a thoughtful human is a useful thing even if it's not really thinking, just as Google translations are useful in spite of being done without any shred of understanding. But we would know for sure that those convincing robots had 'no-one at home'. I don't know how well we would deal with that.

Better algorithms, or at any rate, better machine learning, is a different and less troubling prospect.
posted by Segundus at 6:13 AM on April 22, 2016 [2 favorites]

As to the content, I wonder what the proliferation of domain and task specific "ai" means big picture to our understanding of what constitutes "intelligence".

Maybe it really is nothing more than a bag of tricks, and we just happen to have a larger bag.
posted by leotrotsky at 6:18 AM on April 22, 2016 [6 favorites]

Figuring out our thinking meat is tough. When you look at the relatively simple evolved neural nets we've made, they are just really weird, and sometimes it's not even clear how they're solving the problem they've solved. Now ramp that complexity up over millions of years and put it inside of a chemical soup that effects how the net functions. Oh, and don't forget to instantiate it into a physical body. Meat brains are just really really really ridiculously complicated, and as a result neuroscience is really damn hard.
posted by leotrotsky at 6:21 AM on April 22, 2016 [3 favorites]

If there's some way to bootstrap around that, swell. But the result is definitely going to be alien to us in some pretty fundamental ways, and if we're putting them out into the real world I'd worry about behavior at the fringes of the data set.

I know talk about consciousness is an absolute rabbit hole, but when you've got a functioning thing out there in the world that's nothing more than a series of domain specific responses, it does seem kind of like a philosophical zombie.

Of course, there's no reason to think that we're not also the same kind of thing, just with some added delusions of self-determination. (I think there are some interesting psychology studies to that effect.)
posted by leotrotsky at 6:21 AM on April 22, 2016 [3 favorites]

"Siri, do you have a soul?"
posted by sammyo at 6:28 AM on April 22, 2016

In general I'm skeptical of Moore's law-like projections about technological growth, just because technology has developed in some way in the past is no guarantee that the pattern development will continue like that.

As a side note, it's kind of disappointing to me that machine learning has eaten up AI from the inside out. In some ways, older AI research, by trying to characterize things like cognition in a computational format was a much more interesting field of research than the current "let's get a whole lot of data and put it into a neural net".
posted by phack at 6:29 AM on April 22, 2016 [3 favorites]

I guess it's an idea that is appealing to externalism. You could conceive of AI not as figuring out how to make a robot Cartesian subject, but as a matter of getting better and better connecting different environments in ways (*handwaving increases*) that are cognitively useful.
posted by thelonius at 6:31 AM on April 22, 2016

I guess that's the open question. Is "intelligence" a super complex algorithm that we can at this point not even comprehend or is it just a whole lot of simpler bits of neuron "code" working on a whole lot of data?

The growth of seemingly "smart" interfaces into the big datasets like google and siri and the google self driving car seem to indicate that the clever algorithm is not the path. No one thinks the google search engine is intelligent but it is not uncommon to have someone exclaim that it found something one needed but didn't know quite what to ask for.
posted by sammyo at 6:46 AM on April 22, 2016

We don't need smarter computers to have conversations over cocktails with us. We need them to be the brains of robots that will be our cars, miners, surgeons, farmers, etc. Those computers are the step-change of producvity that will make almost everything we use and consume somewhere between 80% and 99.9% cheaper.
posted by MattD at 6:53 AM on April 22, 2016

Sure, dataset matters more than algorithms. But the relevancy engine that determines whether something belongs in the data set is pure algorithm. And if you do *that* badly, you have a sociopaths, at best.
posted by DigDoug at 6:53 AM on April 22, 2016 [1 favorite]

"Siri, do you have a soul?"

"Playing Otis Redding"

"Dammit, Siri."
posted by leotrotsky at 6:54 AM on April 22, 2016 [6 favorites]

Getting machines to read and understand is one of the most important problems. There is obviously a huge data set of this, but most isn't in a helpful format. Google found two news web sites that format their articles with summaries and are currently using that for machine learning.

Solving this will be a giant leap.
posted by bhnyc at 6:56 AM on April 22, 2016

We need them to be the brains of robots that will be our cars, miners, surgeons, farmers, etc.

So, smarter than 90% of the people sitting at the bar waiting on a conversation?
posted by DigDoug at 6:58 AM on April 22, 2016 [1 favorite]

ducking the need to deal with actual meaning

just as Google translations are useful in spite of being done without any shred of understanding

What is meaning but semantic relationships in a web of information? Isn't meaning precisely what they are dealing with?

The algorithms they use to infer probabilistic correlation between words in different languages using webs of word relationships is mind-boggling and certainly seems to me to qualify as "understanding" in any sense of the word you care to name.
posted by nzero at 7:03 AM on April 22, 2016 [2 favorites]

Not smarter. Differently intelligent. Knowing what they need to know for their specialty.
posted by MattD at 7:04 AM on April 22, 2016 [1 favorite]

Also the "dirty secret", well not dirty and not particularly secret, is that the intelligence in the current successes in Big Data is the data scientists that carefully tune choose the algorithm used and tweak the parameters of the neural net to get results seem reasonable. My example of google search seeming "smart" is to some small or large degree to a very competent team working to fight the SEO hackers but also to keep the results useful for regular folks.
posted by sammyo at 7:12 AM on April 22, 2016

"understanding" in any sense of the word you care to name.

The machines have no sense of narrative. You can't ask them questions about what they just read.
posted by bhnyc at 7:17 AM on April 22, 2016 [2 favorites]

What is meaning but semantic relationships in a web of information?
I would say that meaning is extracted from such relationships. It is not identical to the network of semantic relationships.
posted by crazy_yeti at 7:17 AM on April 22, 2016 [3 favorites]

Those computers are the step-change of producvity that will make almost everything we use and consume somewhere between 80% and 99.9% cheaper.

...if you believe that, contrary to the experience of the last forty years or so, the productivity gains will actually be passed on to consumers and/or workers. Unless the world changes a lot, either prices won't drop or wages will be depressed right along with them.
posted by praemunire at 7:20 AM on April 22, 2016 [7 favorites]

If you recall the search update that smacked down metafilter nobody from google could explain the result, from which we can infer that nobody at google really understands the totality of the current search algorithm. It is a big input-output box with knobs and levers and sliders that humans change and if it seems to work OK with this update then they do it and if it doesn't they twiddle.

And if people get laid off or the wrong person gets put on the drone hit list well it's nothing personal it's just a numbers game.
posted by bukvich at 7:20 AM on April 22, 2016 [3 favorites]

We need them to be the brains of robots that will be our cars, miners, surgeons, farmers, etc.

So, smarter than 90% of the people sitting at the bar waiting on a conversation?

More like, the reason why 90% of people are sitting at a bar without a job.
posted by permiechickie at 7:30 AM on April 22, 2016 [6 favorites]

leotrotsky: Periodic reminder that Edge.org's founder, John Brockman, is a literary agent. A lot of these folks are just participating here to shill their pop sci books on their particular hobby horses.

Even so, these are fascinating hobby horses. A++, would ride again!
posted by clawsoon at 7:35 AM on April 22, 2016 [2 favorites]

My problems with artificial intelligence discussions when linked to human intelligence is reluctance to define intelligence in terms of a set of behaviours sufficient to consider intelligence is present. It's tricky to argue something meets a spec that doesn't exist.

Cynically I see intelligence as a term used to specify that thing that people have, animals do a bit but it's not the same because that would make things awkward and computers definitely not because they're computers. And definitions start and adapt around these principles.
posted by SometimeNextMonth at 7:46 AM on April 22, 2016 [4 favorites]

Additionally, the nascent problem of ensuring AI friendliness might be addressed by focusing on dataset rather than algorithmic friendliness—a potentially simpler approach.

So, letting 4chan train Tay was perhaps an error?
posted by flabdablet at 7:55 AM on April 22, 2016 [3 favorites]

Those computers are the step-change of producvity that will make almost everything we use and consume somewhere between 80% and 99.9% cheaper.

If the entire history of technological advancement is any kind of guide (and I think it probably is), making everything we use and consume somewhere between 80% and 99.9% cheaper will be the marketing industry's cue to make the consumption of somewhere between 5x and 1000x more stuff not only normal but every consumer's patriotic duty.
posted by flabdablet at 7:59 AM on April 22, 2016 [1 favorite]

In 2005, Google software achieved breakthrough performance at Arabic- and Chinese-to-English translation based on a variant of a statistical machine translation algorithm published seventeen years earlier, but used a dataset with more than 1.8 trillion tokens from Google Web and News pages gathered the same year

Does the timing and subject here make anyone else's spidey sense tingle?
posted by Reasonably Everything Happens at 8:20 AM on April 22, 2016 [2 favorites]

Google DeepMind announced its software had achieved human parity in playing twenty-nine Atari games by learning general control from video using a variant of the Q-learning algorithm published twenty-three years earlier, but the variant was trained on the Arcade Learning Environment dataset of over fifty Atari games made available only two years earlier.

Yeah, 2600 ROMs and emulators have only been available for the last two years ... (eyeroll)
posted by RobotVoodooPower at 8:58 AM on April 22, 2016

You(tube) comments don't say.
posted by symbioid at 9:02 AM on April 22, 2016

I think this is one of the huge challenges that individual programmers and startups are going to face over the next few years. I sat down to do some machine learning to find faces in my image library. I'm making some progress, but way more of my time on that particular project has been spent classifying faces and drawing boxes around them than on actually writing code.

If I had a budget for this stuff I'd Mechanical Turk it or something, hire a large army of relatively cheap humans to build me better data sets, and if I were some major centralized image hosting or social media site I'd have more of a data set to work with than the ten thousand or so pictures in my personal image library.

If the "throw trainable computing techniques at all the things" folks are correct, yeah, there's still room for tweaking heuristics, but the big advances are going to come from curating the training data sets.

This does not bode well for decentralized systems or personal computing as we've known it for the past few decades.
posted by straw at 9:19 AM on April 22, 2016 [2 favorites]

The machines have no sense of narrative. You can't ask them questions about what they just read.

You can't....what? That's exactly what you are doing. I think maybe you don't understand what's going on with Google translate. You are precisely asking it questions about what it read (the "read" part is the linguistic corpus it's fed during training, not your query! your query is the question!).

I would say that meaning is extracted from such relationships. It is not identical to the network of semantic relationships.

I sort of agree with you, in that the act of inferring meaning is precisely the extraction of those semantic relationships by building another web of semantic relationships to represent them. The "meaning" part is still the relationships themselves, though.
posted by nzero at 9:48 AM on April 22, 2016 [1 favorite]

Serious question: This thread and the article seem to treat datasets and algorithms as two completely distinct entities with little or no overlap. But is it really that simple? Do we know for a fact that there are distinct, non-/minimally-overlapping "dataset" and "algorithm" areas in the human brain?
posted by ZenMasterThis at 9:54 AM on April 22, 2016

That's my evening sorted: Gonna sit down with Julian Jaynes and a bottle of wine.
posted by JohnFromGR at 10:29 AM on April 22, 2016 [2 favorites]

"Scientist; Inventor; Entrepreneur; Investor." is a lot of jobs.
posted by maryr at 11:39 AM on April 22, 2016 [3 favorites]

He's a scientist on every second Tuesday.
posted by leotrotsky at 12:53 PM on April 22, 2016 [1 favorite]

Yes, cleaner data will definitely give us human-level superficial intelligence any day now. Real soon. Human level AI is right around the corner. That's never been promised before, in the 60 year history of AI. Somebody (not me) will surely sort it all out next week.

That's a good trick really. Next time a project manager tells me that my algorithms suck, I'll just tell him it's a stochastic model with "bad data".

Bonus: Drinking during the singularity
posted by mr.ersatz at 3:14 PM on April 22, 2016 [1 favorite]

Before we get too much further down the deep dependence on AI road at all, we had better address the problem of the Carrington Event, the 1859 eponymous occurrence of which would have knocked out almost the entire electronic infrastructure had it happened 150 years later, and destroyed many of the individual parts including all satellites and vast numbers of personal electronic devices, because it turns out that against previous expectations, sunlike stars have been shown to be capable of "superflares" orders of magnitude greater than the 1859 event.

Out of all the stars with superflares that Christoffer Karoff and his team analyzed, around 10% had a magnetic field with a strength similar to or weaker than the Sun's magnetic field. Therefore, even though it is not very likely, it is not impossible that the Sun could produce a superflare.

"We certainly did not expect to find superflare stars with magnetic fields as week as the magnetic fields on the Sun. This opens the possibility that the Sun could generate a superflare -- a very frightening thought" elaborates Christoffer Karoff.

If an eruption of this size was to strike Earth today, it would have devastating consequences. Not just for all electronic equipment on Earth, but also for our atmosphere and thus our planet's ability to support life.

And it looks as if our sun has actually had two small superflares within the last 1500 years

Trees hid a secret

Evidence from geological archives has shown that the Sun might have produced a small superflare in AD 775. Here, tree rings show that anomalously large amounts of the radioactive isotope 14C were formed in Earth's atmosphere. 14C is formed when cosmic-ray particles from our galaxy, the Milky Way, or especially energetic protons from the Sun, formed in connection with large solar eruptions, enter Earth's atmosphere.

The studies from the Guo Shou Jing telescope support the notion that the event in AD 775 was indeed a small superflare, i.e. a solar eruption 10-100 times larger that the largest solar eruption observed during the space age.

"One of the strengths of our study is that we can show how astronomical observations of superflares agree with Earth-based studies of radioactive isotopes in tree rings." Explains Christoffer Karoff.

In this way, the observations from the Guo Shou Jing telescope can be used to evaluate how often a star with a magnetic field similar to the Sun would experience a superflare. The new study shows that the Sun, statistically speaking, should experience a small superflare every millennium. This is in agreement with idea that the event in AD 775 and a similar event in AD 993 were indeed caused by small superflares on the Sun.

A 2012 paper argues that the 775 event was 10 to 20 times greater than the Carrington Event.
posted by jamjam at 3:39 PM on April 22, 2016 [6 favorites]

Yeah, 2600 ROMs and emulators have only been available for the last two years ... (eyeroll)

The Arcade Learning Environment dataset is two years old - that's what it's refering to.
posted by ymgve at 3:03 PM on April 23, 2016

No... uhhh... this one goes in your butt
posted by flabdablet at 5:40 AM on May 2, 2016

« Older Everybody dies | Counter-Strike Casino Newer »

This thread has been archived and is closed to new comments

MetaFilter

Datasets over algorithms
April 22, 2016 5:46 AM Subscribe

Tags

Share

Datasets over algorithms April 22, 2016 5:46 AM Subscribe

Tags

Share

Datasets over algorithms
April 22, 2016 5:46 AM Subscribe