Not a cure for this crisis—One more symptom of it
May 31, 2023 5:08 AM Subscribe

AI machines aren’t ‘hallucinating’. But their makers are. - Naomi Klein
posted by signal (105 comments total) 33 users marked this as a favorite

Meanwhile, people who are not Elon Musk and would thus be expected to have some perspective are suggesting that "Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war" (NYT).

Needless to say, the article is nebulous about the large-scale risks actually posed by AI. There are of course a number of ways thnings frequently labeled as AI could and do make the world crappier — LLM and image-generators spewing untrustworthy content, inscrutable neural networks being plugged into things that ruin people's lives like sentencing guidelines and loan-risk assessment — but despite the heated rhetoric, extinction-level threats seem pretty sparse.
posted by jackbishop at 6:40 AM on May 31, 2023 [7 favorites]

So maybe I read too much William Gibson in my youth, but is it possible to deliberately generate incorrect information to inject into these training sets? There's nothing validating the inputs as far as I can see.

I would love to see a future of warring autocomplete engines trying to figure each other out.
posted by JoeZydeco at 7:13 AM on May 31, 2023 [13 favorites]

...is it possible to deliberately generate incorrect information to inject into these training sets?

Several known LLM training data sets -- in addition to the suspected contents of popular closed training data sets -- are built by scraping the Internet. By definition they're utterly chock full of incorrect but plausible information.

Nothing validates the input of the Internet, either.
posted by majick at 7:29 AM on May 31, 2023 [44 favorites]

In these AI threads, I think it is time to stop talking about the tech and speak more about the people involved pushing this tech, their motivations, the likely damage they will cause, and how we can stop this nonsense now. In a previous thread I posted the question “Why does the porridge bird lay his egg in the air?” This was a fictional way to break the computer and stop the AI sham. Sort of the thing JoeZydeco here is suggesting.

Naomi Klein is saying what a lot of us are thinking, and her piece is a good summation of what is really going on. They are ripping us all off by attempting to both privatize and monetize both individual’s works of art and writing as well as our cultural heritage. This is all about money. ChatGPT, et. al. is theft.
posted by njohnson23 at 7:31 AM on May 31, 2023 [58 favorites]

I meant wildly incorrect and/or outrageous information, like pictures of monster trucks with labels like THIS IS A CAT.
posted by JoeZydeco at 7:33 AM on May 31, 2023 [5 favorites]

(JoeZydeco , magick- Bingo! garbage datasets)

These are still BETA AIs that were trained on unvalidated scrapings. I have seen this propensity to "hallucinate" or bullshit... but has anyone tried AI instances that have been trained on verified expert data? This, plus the addition of some level of post-generation self-validation, would just about eliminate factual errors within the chosen field. As a natural-language front-end, especially for for specialist datasets - this is where I see AI being useful initially.

Re the industry hallucinations Ms Klein lists out - it's early days. It's right to point out the fallacies in the big promises for AI being made, but I can point to self-driving cars, all the promises made there... and the reality that they are still far from realizing those promises, and they likely won't meet all of them. The reality will be a smaller subset.

Anyway, wake me when AI stops liars from lying, in real-time. THAT would be useful, and actually possible, I believe.

We're already wrecking everything; At worst, AI might tweak the speed, that's all.
posted by Artful Codger at 7:40 AM on May 31, 2023 [1 favorite]

I started out as a software tester and one of the principles of software testing is when the program asks for input, you give it garbage to see how it handles it. It was always a quick and dirty way to break code. Giving these “ai” programs wildly incorrect information is the same thing. But as stated above the internet is mostly that to begin with. And the “ai” doesn’t know the difference. It just churns it up and spits it back. These issues are way above algorithms. We need to talk about the ethics of what is going on. And, yes, this is not a species extinction threat. It is an economic threat.
posted by njohnson23 at 7:41 AM on May 31, 2023 [9 favorites]

"Regulate me, government-senpai!' they keep crying, playing up this whole existential threat thing.

You know who else ran the regulatory capture scam to cement their early market advantages? Railroads, telecommunications, energy, broadcast. You know, the massive government-enforced trusts with exclusive rights to extract wealth by creating high barriers to market entry. The usual.

Ineffectual regulation that just makes the guy with the money powerful isn't regulation, but man, they sure are crying out for some regulation for some mysterious reason.
posted by majick at 7:58 AM on May 31, 2023 [21 favorites]

> I started out as a software tester and one of the principles of software testing is when the program asks for input, you give it garbage to see how it handles it. It was always a quick and dirty way to break code. Giving these “ai” programs wildly incorrect information is the same thing.

1995: hack the planet!
2013: fuzz test the planet!

i dunno, should we get the type of general ai that decides that human extinction is the least-bad option, i would like to announce in advance that i am on the side of the robots. the old world is dying, the new world struggles to be born, and there's a good chance it's going to have to eat its way out.

relatedly: i propose that we have left the anthropocene (the era where the death of the planet as we know it became inevitable) and entered the lolyolocene (the era where we do weird shit on the way down).
posted by bombastic lowercase pronouncements at 8:02 AM on May 31, 2023 [13 favorites]

You would not believe how many different ways I and various slightly run-down parts of my house pose an existential threat to humanity on a par with pandemics or nuclear war. Also my laptop, it is super-dangerously old and mitigating its risks should be a global priority.

posted by mark k at 8:09 AM on May 31, 2023 [8 favorites]

small, but illuminating anecdata: I recently faced a small challenge that AI seemed ideally placed to resolve, I thought: the deciphering of a short piece of period handwriting. Surely there's somebody who's trained all that intelligencing on such a task, I thought - and so they have: Transkribus is a university-led manuscript-transcription AI that has been refining its technique for years now. Signup is free, with 500 queries included. So I tried my luck, twice, with different knowledge stacks informing its AI-enhanced bespoke manuscript OCR. The results? Dismal. Worse than useless, despite the image being perfectly well-defined for such a task.

What better alternative, then, but to AskMefi? And thanks to the speculative, serendipitous, fine-toothed multiplicity that is MeFi, the task was solved within a day.

Do not underestimate the challenges we face from AI's movers and shakers - also, do not overestimate the miraculousness of "AI".
posted by progosk at 8:10 AM on May 31, 2023 [18 favorites]

AI potentially gives small-time dictators and would-be Dr. Evils another tool. Putin's damage in dividing the democratic world about anything he wanted is yet to be assessed, and presumably ongoing. He was only limited by the amount of language fluent writers and hackers he could hire.
posted by Brian B. at 8:23 AM on May 31, 2023 [1 favorite]

These are still BETA AIs that were trained on unvalidated scrapings. I have seen this propensity to "hallucinate" or bullshit... but has anyone tried AI instances that have been trained on verified expert data? This, plus the addition of some level of post-generation self-validation, would just about eliminate factual errors within the chosen field. As a natural-language front-end, especially for for specialist datasets - this is where I see AI being useful initially.

No; this is a misunderstanding. It is inherent to the nature of the LLM that this is going on. It has nothing to do with the training material being "accurate" or not.

The AI is a language imitator; it is being trained to "talk like a smart person." But it is not smart. It doesn't even have any interior mind to be smart (or stupid.) All it's managed to do is figure out, statistically speaking, what sort of things a "smart" person would say.

So when you ask it for a source, and it makes one up and gives it an academic-style citation, it's not because the things in its training set included made up sources. It's simply because the sort of thing a smart person would say is "yes, I have a source for this, here it is."

People who understand what's going on have had some fun making it's gaps more obvious. The simplest example of this that I've seen is the person who asked it "Which is heavier, a pound of feathers or a million pound of feathers?" and it responded that they both weighed the same, silly human, don't you understand the million pounds of feathers is just denser without weighing more? Because a lot of it's training material had the old chestnut about a bound of feathers / pound of bricks, but it has still never learned what a pound is.

Some AI (including ChatGPT) is useful for some tasks, but people need to stop thinking of ChatGPT as if it's intelligent but just not well read or something. It has no intelligence and never will.
posted by mark k at 8:27 AM on May 31, 2023 [59 favorites]

but has anyone tried AI instances that have been trained on verified expert data?

That's nearly the entire field of deep learning. Broadly speaking, to train a neural network you need a bunch of inputs with associated outputs that you know are correct. So, for a problem like scene segmentation (isolating particular objects in a photo) you have input photos and a bunch of output segmentation masks. This output, because it's what you're training the network to learn, should be 100% correct and that means verified data.

Large Language Models like ChatGPT also use this, but what are the "outputs" we're learning? LLMs are basically fancy autocomplete engines. They are learning to generate sequences of words whose statistics match the statistics of the input corpus. So, the expert verified data is there, but it's not "facts" or "lies." It's "how humans use the written word."

To me, as a researcher (although not in LLMs) the fascinating thing is that so much of what we see as "meaning", "intent" or "intelligence" is just alignment of high order statistics! That's kind of amazing and humbling.

Edit: I should add, too, that older approaches to machine learning (decision trees, for example) did exactly what you suggest: they used experts to guide the hand-coding of expert systems. Look up MYCIN for a neat example.
posted by riotnrrd at 8:38 AM on May 31, 2023 [7 favorites]

Right, statistically, when someone compares a pound of feathers to another weight, it was a trick question where they really weighed the same. That statistical association out-weighed any actual numbers you include.
posted by RobotHero at 8:40 AM on May 31, 2023 [3 favorites]

Self-driving cars are scary. From what I've heard, Tesla's wink-wink warnings about misusing their automatic mode is particularly irresponsible. But what's also terrifying is human drivers: They are the ones currently killing more than a million people every year. The faster we can get safe self-driving, the better.

General AI is way more complicated, but my general attitude is the same: Yes, there's potential to screw things up, but there's also huge potential to fix the many problems we have today. Agricultural productivity and health care jump to mind.
posted by Mr.Know-it-some at 8:41 AM on May 31, 2023 [6 favorites]

we all know that the old world is dying and the new world is struggling to be born, but what we haven't yet accepted is that it's probably going to have to eat its way out.
posted by bombastic lowercase pronouncements at 8:45 AM on May 31, 2023 [13 favorites]

markk I get your point. When you're trying to teach someone or something to "comprehend" and respond in a human language, it doesn't actually matter whether the training info is factually correct or not. Only that the training material provides enough examples of language patterns and use.

These first public beta LLM instances are the result of that training effort. I still very much think that the developers started trying "real" queries and were pleasantly surprised by the usefulness of the output, and this is why they rushed them out into public betas. These LLMs learned to converse, but their specific "knowledge" was often crap. And there is/was a bias towards "overconfidence" in the phrasing of results.

The next step is to "encapsulate" and separate the language capability from its sometimes factually bogus training data, and couple that to expert, validated datasets. So that when you run a query, only the expert data is searched for an answer. AND it's entirely possible to post-check any output for factual errors. Oversimplifying here, but no more so than anyone who thinks an AI trained just on undifferentiated web-scrapings is ready for important use.
posted by Artful Codger at 8:48 AM on May 31, 2023 [2 favorites]

She seems to be under the impression that startup CEOs might overhype their product.
posted by Tell Me No Lies at 9:14 AM on May 31, 2023 [4 favorites]

So that when you run a query, only the expert data is searched for an answer. AND it's entirely possible to post-check any output for factual errors.

Except the problem isn't bad facts in the dataset; the problem is that "facts" are not concepts within the scope of a LLM at all. Sure, the internet is full of terrible nontruths, but the terrible nontruths generated by LLMs are not verbatim recapitulations of these, or even remixes specifically of them; they're remixes of chunks of (often perfectly correct in their original context) language in a way which is pathologically indifferent to the notion that assembled chunks of language have truth values.

Consider that bizarre case recently where a lawyer let ChatGPT write a filing in an airline-related damages suit, and the filing contained references to the nonexistent cases Varghese v. China Southern Airlines, Shaboon v. Egyptair, Petersen v. Iran Air, Martinez v. Delta Airlines, Estate of Durden v. KLM Royal Dutch Airlines, and Miller v. United Airlines. None of these are a result of "poisoned" data; these weren't lifted wholesale from a work of fiction or from a malicious nontruth. Rather, they were high-probability concatenations of words to appear in a legal filing, so it used them together. "Make sure the phrases you put together have validity and not just likelihood" is not actually within the range of things ChatGPT is designed to do.

What it's doing is actually very similar to the kind of thing which Janelle Shane used to do regularly, taking very long lists of things that exist like beers or nail polishes or racehorses, feeding them into a neural network, and having it come up with plausible but imaginary examples in that realm. In that context, it was clear that "coming up with new examples" was part of the project, and nobody assumed that "Velvet One" was an actual racehorse. Put that same functionality in something which will actually write sentences about those racehorses, and people are convinced they're real.

So the LLM problem isn't actually "garbage in, garbage out". Yes, there's garbage going in, and that's not ideal, but even supplying it with only entirely factual data will not address the problem that LLMs have no regard for or capability to even make use of "facts" as a concept.
posted by jackbishop at 9:17 AM on May 31, 2023 [43 favorites]

How is this open letter not just "Pay us to ward off an existential threat that we are creating"....

Journalists first question should be, how is this "Center for AI Safety" related to the scammers at MIRI, but maybe the MIRI scam is too in-depth to understand in a quick soundbyte.
posted by muddgirl at 9:34 AM on May 31, 2023 [3 favorites]

Because of Global Warming, we are near the edge of a precipice which is already crumbling away about as fast as we could run if we did decide to try to save ourselves, and if AI accelerates the crumble even a little bit it will be truly catastrophic.

One aspect of AI which is really bothering me is that AI appears to be feeding the delusion on the part of our elites that the bulk of humanity, along with the living and self-aware competence it quite literally embodies, is dispensable and can be replaced with AIs as the elites surge forward into the glorious future they regard as their birthright.

In and of itself, that delusion is very dangerous.
posted by jamjam at 9:40 AM on May 31, 2023 [18 favorites]

Thanks, mark k and jackbishop, you just saved me from spending longer than I'm about to on saying much the same thing.

The LLM is not built to evaluate meaning. Not of of the language samples used to train it, and not of its own output. Not unless you strip enough meaning from the word "meaning" so that all you're talking about is cross-correlations within language, as opposed to anything that language refers to in the real world. Which is, incidentally, pretty much exactly "meaning" as applicable to computer source code, which is why autocomplete is a feasible way to generate near-usable slabs of that.

All of the current crop of "AI" bots work essentially the same way: they take some fixed-size slab of text (a "token context") as a query into a statistical model, which spits out a corresponding probability distribution, which then shapes the random selection of the next token to be output. Then they drop the first token from the token context, slap the newly generated output token on the end, and do it all again. The "prompt" you see referred to all over the place is just the last part of the token context that needs to be filled in to get the process started.

This is the very same procedure that's been used for decades by hobbyist gibberish generators. If you've never played with such a thing, I strongly recommend that you follow that link and spend half and hour playing with that one.

The linked gibberish generator uses individual letters and punctuation marks as tokens, and works off contexts no longer than six of those. But even with a context as heavily restricted as that, it can emit remarkably coherent streams of output text, easily recognizable as having been derived from the sample texts used to train it. And the longer the token context you allow it to use, the less incoherent the output looks.

ChatGPT et al are doing the same thing, just with massively larger token contexts, made feasible by implementing the statistical model not as a lookup table summarizing simple letter frequency counts, but by using a neural network to approximate the probabilistic essence of a massive and diverse corpus of training text. They work that way not because an "intelligence" is expected to emerge as a result, but simply to make it feasible at all to handle the diversity of queries that a context of 32,000 tokens, most of which represent words or word parts rather than single letters, makes possible.

I wrote a longer explanation of all of that here and Stephen Wolfram wrote a better one here.

Again, if you've not ever played with a primitive gibberish generator I strongly recommend that you go play with the linked one. It will vaccinate you against the GPT hype and the associated anxiety spike over the imminent Geek Rapture.

To me, as a researcher (although not in LLMs) the fascinating thing is that so much of what we see as "meaning", "intent" or "intelligence" is just alignment of high order statistics!

Is it, though? I'd argue that alignment of high order statistics in a sample of written text is more like a stylistic fingerprint that we've hitherto been relying on to judge whether or not the sample is the product of a mind rather than an essentially trivial algorithm.

Meaning and intent and intelligence are attributes that we ascribe automatically to other minds because we recognize them as working similarly to our own. Having made the error - and it is an error - of misrecognizing the author of a slab of Bing-generated text as necessarily mind-like, the faulty conclusion that Bing therefore displays meaning or intent or intelligence follows very naturally.

But a fake Rolex is not a Rolex and a fake mind is not a mind. Bing does not have 44 jewels and is not rated for operation under 6,000 metres of water. It's just been built to look like it could do those things without breaking shit, but it can't.
posted by flabdablet at 9:48 AM on May 31, 2023 [38 favorites]

the LLM problem isn't actually "garbage in, garbage out"

Garbage in is sufficient but not necessary for garbage out. Ask any cryptographically secure pseudorandom number generator.
posted by flabdablet at 9:50 AM on May 31, 2023 [3 favorites]

Thanks jackbishop

...the problem is that "facts" are not concepts within the scope of a LLM at all. Sure, the internet is full of terrible nontruths, but the terrible nontruths generated by LLMs are not verbatim recapitulations of these, or even remixes specifically of them; they're remixes of chunks of (often perfectly correct in their original context) language in a way which is pathologically indifferent to the notion that assembled chunks of language have truth values.

You're highlighting the next hard problem to solve in the development of AI - evolving from a mindless pattern-matcher, to some level of discernment in recognising a "domain" problem, then looking only in domain-related data or learnings for a better answer.

"Make sure the phrases you put together have validity and not just likelihood" is not actually within the range of things ChatGPT is designed to do.

I think it would be a short step to have automatic, possibly recursive self-checking on results, to catch errors in the first-pass outputs. And/or maybe even just some sort of confidence-level indicator.

Consider that bizarre case recently where a lawyer let ChatGPT write a filing in an airline-related damages suit, and the filing contained references to the nonexistent cases Varghese v. China Southern Airlines, Shaboon v. Egyptair, Petersen v. Iran Air, Martinez v. Delta Airlines, Estate of Durden v. KLM Royal Dutch Airlines, and Miller v. United Airlines. None of these are a result of "poisoned" data; these weren't lifted wholesale from a work of fiction or from a malicious nontruth. Rather, they were high-probability concatenations of words to appear in a legal filing, so it used them together.

Agreed. But the same AI could still have been used to follow up by finding the cited cases... or determining that they're non-existent. This requires active searching of online data, but is still within the capabilities of current AI instances. So - in this case, we're looking at improper and incomplete use of AI and other research tools by those lawyers.

Searching out relevant cases is a pretty well-defined and often-performed task, and I'd expect that a law-oriented AI would have to be "well-trained" and dependable on such common tasks, to be safely used. Specialization.
posted by Artful Codger at 9:52 AM on May 31, 2023 [1 favorite]

I think it would be a short step to have automatic, possibly recursive self-checking on results, to catch errors in the first-pass outputs.

Seems to me that anybody who thinks of that as a short step is badly underestimating both the fractal complexity of their own neural hardware and the scale and intricacy of its experiential training corpus, let alone the cumulative effect of having billions of those things interact as a culture.
posted by flabdablet at 9:57 AM on May 31, 2023 [23 favorites]

I meant wildly incorrect and/or outrageous information

I knew that the next Spider-verse movie was due out in June, but didn't know exactly when, so on a lark I asked Alexa what the release date was. Not only did Alexa tell me there was not a sequel planned, but that one would never be made. Meanwhile, the movie releases on Friday.
posted by Servo5678 at 9:59 AM on May 31, 2023 [9 favorites]

flabdablet... I'm sure that I am doing just that. BUT I do also know that by manually tuning my ChatGPT queries, and by building better or followup queries from initial results... lather, rinse, repeat... I can get better results and answers faster and quicker than by just doing Google searches.

I still think that self-checking (eg automatically doing what I'm doing manually) would be relatively simple to get to, for many use-cases.

I admit I'm mostly obsessed with the application of AI as specialized amplifying tools in the hands of expert users, and I pay little attention to the idea of ChatGPT as world-problem-solver. Hey, I've read Douglas Adams. 42.
posted by Artful Codger at 10:09 AM on May 31, 2023 [2 favorites]

I like the description of LLMs as "plagiarism machines," because it's accurate and highlights both the unethical and non-consensual use of so much art and writing in training these machines as well as the threats posed to the livelihoods of so many creative workers.
posted by overglow at 10:18 AM on May 31, 2023 [10 favorites]

ChatGPT reliably misanswers my simple question on who was last to declare war on Germany in WW2 but I can guide it to the correct answer, or Wikipedia’s version of it at least.

My philosophical position is that words are a sufficient functionality to successfully model our human reality, like Helen Keller was able to acquire

Short of cognition,I think calling what LLMs are doing combobulation is perfectly cromulent
posted by Heywood Mogroot III at 10:21 AM on May 31, 2023 [2 favorites]

It's telling that this discussion has veered away from the political and economic content of TFA to arguments about tech and implementation.
posted by signal at 10:31 AM on May 31, 2023 [8 favorites]

The faster we can get safe self-driving, the better.

The faster we can reduce the amount of death and injury that cars cause, the better. Seems to me that we could do that via incremental improvements in urban design and incremental improvements in employment location incentives way faster and way more thoroughly than is ever likely to happen because of self-driving AI.

Pursuing self-driving AI in order to reduce road death and injury is like pursuing fusion power in order to reduce fossil fuel extraction. It's a gamble whose upside reward is lower than that of the simpler stuff we already know how to do and whose downside risk, mostly in the form of opportunity cost, is way higher.

General AI is way more complicated

I honestly don't believe so. It seems to me that general intelligence, in the sense of having an ability to comprehend the world and apply that knowledge accurately enough and quickly enough to distinguish reliably between obstacles and scenery - is actually a pre-requisite for any engineered control system that could actually operate an automobile more safely than the average human driver when given a comparable degree of autonomy.

My philosophical position is that words are a sufficient functionality to successfully model our human reality, like Helen Keller was able to acquire

I think words can do a pretty good job of communicating a reality model from one mind to another to some extent, but cannot possibly constitute any such model in and of themselves. People aren't not a person just from being deaf and blind. Words are (primarily interpersonal) signals, not constitutive pieces of people.

ChatGPT is not a person, it's a complicated echo chamber.
posted by flabdablet at 10:31 AM on May 31, 2023 [18 favorites]

It's telling that this discussion has veered away from the political and economic content of TFA to arguments about tech and implementation.

I think there's broad agreement in these parts that the political and economic content of TFA is so close to 100% correct as to be uncontroversial.

There remains a lot of magical thinking on display around the tech itself, though, which I interpret as mostly being due to widespread ignorance of how absurdly primitive its underlying architectural ideas are. And to the extent that such magical thinking serves the interests of political and economic actors who clearly already have far too much of both kinds of clout, I think making the clearest attempt I can to dispel it is in the public interest.
posted by flabdablet at 10:39 AM on May 31, 2023 [30 favorites]

^^^ yes, this.
posted by Artful Codger at 10:42 AM on May 31, 2023 [2 favorites]

> I like the description of LLMs as "plagiarism machines," because it's accurate

It absolutely is not. There is no copying or regurgitation of source material. In a properly trained network there is no way to retrieve exact copies of the training data.

You can object to deep-learning tools like Midjourney and ChatGPT but do so with facts on your side.
posted by riotnrrd at 10:47 AM on May 31, 2023 [6 favorites]

I still think that self-checking (eg automatically doing what I'm doing manually) would be relatively simple to get to, for many use-cases.

One of the things that used to frustrate the bejeezus out of me as a budding computer programmer all those decades ago was the general public's inability to understand what it is about some seemingly simple change of requirements that would require the scrapping of an entire system and having a replacement rebuilt on an entirely different set of design assumptions.

This idea has been expressed in countless ways by countless people. I quite like the formulation that goes "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."

Judging the validity and/or usefulness of some arbitrary search result is hard. So hard, in fact, that doing it reliably requires human cognition. "Automatically doing what I'm doing manually" is achievable for loads of tasks - sorting buttons, say, or completing jigsaw puzzles - but not those requiring domain expertise and/or enough real-world lived experience to cover a partial lack of it.
posted by flabdablet at 10:54 AM on May 31, 2023 [6 favorites]

Q: Why does the porridge bird lay his egg in the air?

A: The concept of a "porridge bird" laying its eggs in the air is purely fictional and does not have a basis in reality. It seems to be a creative and imaginative element of a story, poem, or folklore. In such fictional contexts, the reason for the porridge bird laying its egg in the air can vary depending on the narrative and the purpose it serves within the story. It could be a whimsical or fantastical detail intended to evoke wonder or surprise in the audience. The specific reason for the porridge bird laying its egg in the air would be determined by the author or creator of the story in which the bird appears.
posted by Cpt. The Mango at 11:05 AM on May 31, 2023 [1 favorite]

Metafilter: I think there's broad agreement in these parts that the political and economic content of TFA is so close to 100% correct as to be uncontroversial.
posted by Wood at 11:05 AM on May 31, 2023 [1 favorite]

There is no copying or regurgitation of source material. In a properly trained network there is no way to retrieve exact copies of the training data

Your definition of plagarism is too strict. You don't need exact copies of the training data, you just need to show that it picked up a series of most-likely-transitions because the topic is small and your contribution is highly rated. Previously

Aha - knew I had to be in the set because when you ask it beer questions about certain things, it pops back phrases I know I've written!
posted by drewbage1847 at 1:21 PM on April 19
posted by persona at 11:08 AM on May 31, 2023 [5 favorites]

The next step is to "encapsulate" and separate the language capability from its sometimes factually bogus training data, and couple that to expert, validated datasets. So that when you run a query, only the expert data is searched for an answer. AND it's entirely possible to post-check any output for factual errors. Oversimplifying here, but no more so than anyone who thinks an AI trained just on undifferentiated web-scrapings is ready for important use.

I have been hoping that Google would mash up Knowledge Graph (which itself is sometimes wrong, but much less frequently and, having known sources, can actually be corrected) with a conversational LLM. Given the siloed nature of the organization, however, I don't expect them to actually assemble the pieces.
posted by wierdo at 11:10 AM on May 31, 2023

It's interesting reading this conversation and I just want to thank the longer answerers here for describing at modest length the "LLMs don't actually understand anything" idea. I've written about this as well (I called it "the great pretender") and once that concept clicks the whole shebang feels very different — still interesting and useful but in a much more limited way.
posted by BlackLeotardFront at 11:17 AM on May 31, 2023 [1 favorite]

These are still BETA AIs that were trained on unvalidated scrapings. I have seen this propensity to "hallucinate" or bullshit... but has anyone tried AI instances that have been trained on verified expert data? This, plus the addition of some level of post-generation self-validation, would just about eliminate factual errors within the chosen field. As a natural-language front-end, especially for for specialist datasets - this is where I see AI being useful initially.

Others have already pointed out that transformer-based LLM architectures have this kind of limitation inherently. The real path to any kind of "general AI" probably doesn't lie through making LLMs even bigger. What has caused the alarm among AI people (and it's by no means universal) is not actually that many informed people think that LLMs will soon become conscious - that simply isn't possible and they know that. It's that these transformer-based LLMs are performing much better than was predicted - as they got bigger, they hit a point where... boom, they suddenly got very impressive, very fast. That has raised a few concerns that it could happen on an architecture that does have a plausible path towards general reasoning.

There is no copying or regurgitation of source material. In a properly trained network there is no way to retrieve exact copies of the training data

I think a lot of people have been looking at this from the wrong point of view which is either for or against it being plagiarism but is based on a narrow legalistic regurgitation of what current laws do or don't say. That's completely the wrong way to look at it - ironically the way a LLM itself would look at it. The law will say what we collectively (for some pretty complicated values of that word) decided it should. If we collectively decide that a human being producing art "in the style of" is not plagiarism but training a machine to do it is - that is ok and good! We don't need to twist the existing law and pretend to be lawyers in order to justify our respective views. Better to decide what we want and then make the law say that rather than rely on old law which never anticipated this being possible.
posted by atrazine at 11:29 AM on May 31, 2023 [9 favorites]

Hallucination #4 from TFA has me already staring into the abyss, what with being in an industry that is going to be utterly decimated by these systems.

The only thing stopping the giant corporate owners of my employer's owners from starting to make changes now is concerns about IP, both ways, in terms of potentially surrendering IP to the owners of generative AI platform owners as well as in terms of being able to copyright anything coming in part or in its entirety out of such systems in response to prompts. And so, for the moment, they're simply waiting for the dust to settle around major lawsuits until they know what the rules of the game will be. They have all but told us as much.

It'll likely be perfectly timed with me hitting my earliest retirement option just in time to witness the continued funding of my union's retirement plan evaporate when employment in the field is cut by 90% over the course of a surprisingly short time.

In the meantime said union has been disappointingly sluggish in acknowledging just how much of an existential threat this really is to our industry. Mostly because of a subset of members going: "Yawn, it's not really intelligent, it can't replace a real human artist/technician/whatever, it's never going to be as good as a real human! I'm irreplaceable because of my human genius!"

What they fail to acknowledge is this: to satisfy our industry's particular sets of corporate overlords it doesn't have to be as good as a human or actually intelligent, whatever that means. They couldn't give two shits about that. If it attracts eyeballs and sells stuff... then it's good enough. Good enough is all that matters. So the future of my field is guaranteed to be 90% generative AI throwing shit at the wall and 10% a few remaining humans polishing the turds until the systems get good enough to replace them too. The available systems are already good enough to sustain such an approach. It's only a matter of IP and copyrights now.

I'll happily join everybody else in the gutters when I've no longer got anything to do and I'll shake my fist at the sky with them while yelling "but it's not really intelligent!". Not sure it'll make any difference but maybe it'll distract from the hunger pains.
posted by Hairy Lobster at 11:44 AM on May 31, 2023 [16 favorites]

In the early 80s I wanted to write a gibberish generator and spent several days thinking about the data structure I’d need to represent the Markov chain. Suddenly it hit me how simple it could be! Simply split the text up into n-letter chunks (now called n-grams) then sort them. Done!

Then index into the file randomly, chose an n-gram and use the final n-1 characters to index back into the file. Choose randomly from the n-grams with that prefix and repeat.

I fed it several weeks worth of content from netnews’s comp.arch and alt.flame then generated fabulous gibberish flaming about computer architecture. Good times, good times.
posted by sjswitzer at 12:02 PM on May 31, 2023 [6 favorites]

Judging the validity and/or usefulness of some arbitrary search result is hard. So hard, in fact, that doing it reliably requires human cognition. "Automatically doing what I'm doing manually" is achievable for loads of tasks - sorting buttons, say, or completing jigsaw puzzles - but not those requiring domain expertise and/or enough real-world lived experience to cover a partial lack of it.

You're not wrong! Still, you eat an elephant one bite at a time. Short-answer - specialization. Narrow the focus.

The stupid lawyer example - we'd all expect a suitable legal AI could reliably find actual cases. There could even be shorthand or a flag for "find cases" ... or maybe this intent is parsed out of the query's syntax. From that, it's understood that the results may contain cases, which should then be validated as existing, and provide links or references. Doesn't seem like a huge leap... and I believe that there are many fields with tasks that can be approached this way.

(retired programmer, btw. not the smartest, obviously, but I ate regularly)
posted by Artful Codger at 12:05 PM on May 31, 2023 [1 favorite]

Mark K: I came here to post that Kieran Healy tweet but you beat me to it. Anyway, here’s another:

Look I'm not gonna lie, my friends and I are going to require an absolute truckload of grant money to mitigate the literal species-level existential threats associated with this thing we claim to be making; this is how you know we are deeply serious people btw

posted by sjswitzer at 12:07 PM on May 31, 2023 [11 favorites]

Artful Codger, that's pretty much separate from the AI component that's being discussed; that's just a regex that that will automatically do a LexisNexis search, and could be fed any text.

Notably, it does nothing to solve the problem of the AI not having any clue about the *context* of the cases; it's just a way to put more human oversight into the entire system. Any "understanding" isn't happening at the level of the LLM, it's happening at the interface level.

Yet another place where fuzziness about what parts of these opaque tools are actually "AI" makes it hard to have a conversation.
posted by sagc at 12:19 PM on May 31, 2023 [1 favorite]

Busy today and much of what I’d say is being handled by the usual crew of these threads (thanks flabdablet, mark k, riotnrrd and probably some others I’m forgetting but will go back and favorite later). Only thing I want to chime in with for Artful Codger is: the key aspect I think you’re missing is that until an expert system has encoded not just the conceptual relationship mappings but *also* the internal dynamics/systemic tensions of whatever phenomenon is being modeled, it will not be possible to correctly interpret and make predictive use of expert datasets for novel queries well outside the training set.

If you want to solve for novel situations with no analogue in the training set you are going to have to instance, parameterize, train, and solve with reinforcement learning or a similar scored abstract, unsupervised, and on the fly. Hallucination is with us until then, and at 42 with a life expectancy of 84 I don’t expect to see it beaten (a survey of experts in the field had a median estimate of 32 years for AGI, IIRC, and I think that’s overly optimistic though not impossible).
posted by Ryvar at 12:29 PM on May 31, 2023 [3 favorites]

If we collectively decide that a human being producing art "in the style of" is not plagiarism but training a machine to do it is - that is ok and good!

I wouldn't say that's "good"; I think that would be a catastrophe for computer art tools. Would using the eyedropper tool to get exactly the color that Picasso used as part of 'Flowers' then be considered plagiarism? And so on. I agree that the industry needs legal clarity on the topic of "generating statistics from copyrighted material," but I strongly prefer that humans be allowed to analyze data even if the data source is owned by someone. (For example, baseball nerds combing through statistics to find the best pitcher on rainy days in 1976.)

I think deep-learning tools like Midjourney are fantastic accessibility tools and I'm disappointed that so many people get angry about them because, essentially, it's a computer doing it.
posted by riotnrrd at 12:51 PM on May 31, 2023 [1 favorite]

Another comment I like:

But there are a lot of jobs that look like they're about producing text, that aren't - they're about producing knowledge, insight, analysis, beauty, or some combination thereof, and the text that is the output is just a medium.

The "hallucinations" are because it's directly imitating the textual output of our knowledge rather than imitating our knowledge itself.

Any existential threat machine learning currently poses is as much to blame on capitalism. It's going to be something that makes nonsense 10% of the time, but they either 1) make that 10% someone else's problem, or 2) that 10% costs them less than it would cost to pay people to generate it in the first place.
posted by RobotHero at 1:06 PM on May 31, 2023 [8 favorites]

Your definition of plagarism is too strict.

Plagiarism is representing someone else's work as your own. If I sampled an excerpt of a Miles Davis solo, then posted it and claimed to have played or composed it myself, that's plagiarism.

But (pretending that I am a competent trumpet player), if I play a solo in an imitation of Miles' style, that's not plagiarism. I had to listen to his playing to gain an understanding of how to do that, and presumably there's some process in my mind that is doing some kind of intuitive statistical analysis and "quoting" tiny fragments that are typical to a Miles solo. There are "copies" (albeit flawed, subject to whatever the human memory process is) of those solos in my mind that I'm using as source material to create something new. But I'm still creating a thing, not stealing it.

That said, I do look askance at these kinds of tools and their possible benefit/harm to society (and the environment). "AI" and capitalism seem to just make each other worse.
posted by Foosnark at 1:26 PM on May 31, 2023 [5 favorites]

One aspect of AI which is really bothering me is that AI appears to be feeding the delusion on the part of our elites that the bulk of humanity, along with the living and self-aware competence it quite literally embodies, is dispensable and can be replaced with AIs as the elites surge forward into the glorious future they regard as their birthright.

*opens ChatGPT*
"Design a guillotine which works in zero-gravity environments."
posted by JohnFromGR at 1:37 PM on May 31, 2023 [4 favorites]

a survey of experts in the field had a median estimate of 32 years for AGI, IIRC, and I think that’s overly optimistic though not impossible

"AGI in thirty years" is pretty much what I've been hearing for roughly the last forty, which is about as long as I've been paying attention.

The pattern has been consistent: the more research is done into what brain-like systems are actually doing, the more nuance and complexity is revealed and the more questions suggest themselves as requiring further research to answer. Forty years of watching that happen has given me a tremendous amount of respect for the capabilities of evolved, living systems; forty years of watching engineered, manufactured systems get ever more complicated has convinced me that none of their capabilities are even within the same galaxy as what living systems do, routinely, without fuss, with very close to minimal power consumption.

It's not that there's something mystical or supernatural about living systems that could not in principle be replicated by engineers; it's just that the practical difficulty of doing so is constantly needing to be revised upward as discovery and invention continue. The single most salient characteristic of living systems is how messy they are, and the closer we manage to look at them, the messier they reveal themselves to be.

"Generative AI" is fun and all, but if it's any kind of breakthrough it's a marketing breakthrough, not an engineering one. It's opening up whole new vistas of destructive potential in front of the PR industry's "creatives".

I would find Human Ingenuity™ to be a much more positive and hopeful thing if the Humans who have put themselves in charge of deciding where to apply the Ingenuity™ were not so irredeemably stupid.
posted by flabdablet at 1:45 PM on May 31, 2023 [12 favorites]

I remain constantly fascinated by the human need to believe there is some greater intelligence "out there" and that we desperately need to find it, engage with it, or even build it. And that no matter what the format (be it ancient scrolls, carved in stone, or ChatGPT) we're always ready to jump in feet first with scuttleloads of "faith," "devotion," & "trust."
posted by chavenet at 1:54 PM on May 31, 2023 [5 favorites]

*opens ChatGPT*
"Design a guillotine which works in zero-gravity environments."

I think one of the reasons some people are excited to play with ChatGPT is this

whatever you call it, whatever you think it can or can't do, it's like we built an imaginary friend to talk to and throw ideas at, and see what happens. the fact you can provide input about as easily as typing into a search engine is, for many people, a small marvel. whatever the quality of the output, it's kind of neat that any combination of terms you provide will result in ostensibly intelligible output. Sometimes useful output. That is neat.

Yes to TFA and all the criticisms, otherwise
posted by elkevelvet at 1:55 PM on May 31, 2023 [2 favorites]

it's like we built an imaginary friend to talk to and throw ideas at, and see what happens

Just don't, for the love of all that's sane and beautiful, start praying to the little fuckers.

Marrying them was already a bridge too far in my book.
posted by flabdablet at 2:08 PM on May 31, 2023 [3 favorites]

and that we desperately need to find it, engage with it, or even build it

Ugh. You just reminded me that we are dealing with primates here and “have sex with it” is definitely going to enter the equation, and sooner rather than later.

On preview:
Goddamnit, flabdablet. This group mind thing is weird.
posted by Ryvar at 2:12 PM on May 31, 2023 [2 favorites]

does copyright mean anything if you're poor? the valley has decided to assert "no".
posted by j_curiouser at 2:24 PM on May 31, 2023 [2 favorites]

does copyright mean anything if you're poor? the valley has decided to assert "no".

There's no copyright infringement going on. Copyright doesn't entitle you to own an artistic style, nor does it forbid other people from analyzing your artwork. Does it infringe the copyright of Picasso's estate to say "he used more blue in his paintings in the year 1901 than he did in 1899"? That's essentially all these machine learning systems are doing: aggregating (extremely complex) statistics about visual art, or written language, etc.

I find the copyright and legal maximalist response to machine learning tools to be alarming. Don't we hate Disney because they sue preschools with Mickey Mouse murals? Is that the kind of world we want to live in, where entire swathes of human expression are now illegal to explore because someone got there first?
posted by riotnrrd at 3:23 PM on May 31, 2023 [8 favorites]

The problem with copyright/plagiarism is that neural network training (not just LLMs, not even just all deep learning) is very deliberately mimicking humans. There is a very real sense in which that whole family of technologies learns how we do, and stores in an equally or possibly even more “pureed” internal representation of what terms like “asphalt-textured bumps” or “fish-scale-shaped” mean in the context of an oil painting or photograph.

If it’s done right, that is. Sometimes somebody or something horribly fucks up during training and you find half a tomato in your jar of Ragu (= a copy or partial copy of a training image or text block).

The problem is how copyright is setup for humans: under the existing system grabbing everyone’s art/text for training shouldn’t be a problem. Any human can see it and “train” their neural network on it, after all. What’s different is because these systems are automating a mathematical approximation of the human learning process, their outputs probably shouldn’t be any more copyrightable than, say, the first twenty digits of pi. The response to any given query is ultimately an expression of pure math on a fixed set of data.

But what if an artist takes that deemed-non-copyright-worthy output and does something unarguably transformative to it; like some very heavy editing in Photoshop that fully replaces a third of the image?

And that’s the real problem: well-implemented transformers fundamentally break the conceptual model of the existing copyright regime. Which in our version of capitalism is a virtual guarantee that a lot of people are about to get hurt, very badly.

So yeah, “just ban all copyrighted material in the training sets” feels weird because it’s just doing what humans do as part of their education, much, much faster. And the artists end up just as unemployed a few years later once the same point is reached with “clean” training data. And the countries or companies that didn’t respect that ban are being handed a massive competitive advantage in the interim. In short: collective disadvantage in the near term for no meaningful benefit in the long.

If there’s a good answer to all that, I haven’t seen it.
posted by Ryvar at 3:33 PM on May 31, 2023 [7 favorites]

This output, because it's what you're training the network to learn, should be 100% correct and that means verified data.

This is incorrect. Don't overfit your training data.
posted by DeepSeaHaggis at 4:12 PM on May 31, 2023 [2 favorites]

Google’s Street View maps steamrolled over privacy norms by sending its camera-bedecked cars out to photograph our public roadways and the exteriors of our homes

She loses me with these *waves arms wildly* kind of arguments. It is not illegal to photograph houses from public roadways. Street View is really useful and innovative! Not everything tech companies do is bad.
posted by oulipian at 6:22 PM on May 31, 2023 [9 favorites]

OUR SOFTWARE IS SO POWERFUL IT COULD DESTROY THE EARTH! REALLY, VERRY POWERFUL, GUYS!

Hey guys, you want to invest now on the ground floor in our new very powerful software, right? Guys?
posted by Devils Rancher at 6:36 PM on May 31, 2023 [6 favorites]

Or, a fable.

A man walks into the bedroom with a sheep under his arm and says “This is the pig I sleep with.”
His spouse looks up from their magazine and says “I think you’ll find that’s a sheep.”
The man replies “I think you’ll find I wasn’t talking to you.”
posted by Devils Rancher at 6:38 PM on May 31, 2023 [5 favorites]

Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war" (NYT)

Is climate Armageddon not a problem anymore?

From the article:
The reason this very smart counsel has been ignored is not due to a reading comprehension problem, or because we somehow need machines to do our thinking for us. It’s because doing what the climate crisis demands of us would strand trillions of dollars of fossil fuel assets

Anyway I agree that
The AI problem is just another facet of climate change, or the lack of a health care system, rather than a new problem.

No one enforces the Clean Air Act or Clean Water Act on Oil Companies, and no one is there to enforce copyright law against these tech giants and their lobbying power.
posted by eustatic at 7:13 PM on May 31, 2023 [4 favorites]

If there’s a good answer to all that, I haven’t seen it.
posted by Ryvar

How about paying the artists for the work that the LLM is marketing?
posted by eustatic at 7:16 PM on May 31, 2023 [3 favorites]

Bullshit is what these language models do best. Custom-made bullshit written to your specifications, based on all the bullshit that's come before it.

So any profession that deals in bullshit is in danger. And what profession is built on a foundation of pure, unadulterated grade-A bullshit?

Public relations.

I wonder how much of the current media frenzy surrounding this subject is a result of PR firms acting in self-defense.
posted by MrVisible at 8:52 PM on May 31, 2023 [2 favorites]

I occasionally meet a person who seems at first to be educated and well-spoken, but after a few minutes makes it clear that they are an idiot. They can fool people who don't know things, because they know what smart sounds like. But if you try to learn new information from them, you are nearly always going to realize that you have wasted your time.

The current generation of chatbots puts this power the distract and deceive in the hands of every unscrupulous moron on the internet.
posted by fantabulous timewaster at 9:29 PM on May 31, 2023 [2 favorites]

How about paying the artists for the work that the LLM is marketing?

This assumes the only possible framework is a handful of large corporations controlling all LLMs / generative deep learning neural networks, with rigorous tracking of how training materials were sourced.

And that is not the universe we’re living in, partly of course because Silicon Valley’s vultures of capitalism didn’t and don’t want to track that (“whaddya want me to do? Incriminate myself?!” goddamnit they all sound like Trump in my head now, fuck you, Elon). Partly because the open source community has quietly seized the means of production and this is all accessible to everyone with a PC gaming rig or really high-end CUDA-supporting laptop. There is no path back to a few companies you can sue into compliance, but there’s also no path back to being able to tell who took what from where in any coherent fashion.

TL;DR: “Sure! …How?”
posted by Ryvar at 9:35 PM on May 31, 2023

(a survey of experts in the field had a median estimate of 32 years for AGI, IIRC, and I think that’s overly optimistic though not impossible).

I have a theory called the 30-year rule. It states that people (even expert observers and scientists), when asked to predict when a groundbreaking technological thing will happen, will tend to say "30 years".

You can easily find tons of examples of this, like Ray Kurzweil predicting the singularity or "Back to the Future" and "1984" depicting a world 30 years in the future. Or Elon on Mars colonization.

My theory is that accomplished writers, scientists, and researchers tend to be 30-50 years old, and they want to choose a number that seems far away but can still happen within their own personal lifetime.

Usually things take longer than predicted. I'm still waiting for flying cars.
posted by mmoncur at 9:55 PM on May 31, 2023 [5 favorites]

Brief aside: purely personally speaking my best guess for AGI 5 years ago was 75 years. Pretty sure it’s in one of my comments from around then. Today it’s 70 years. 30 seems reasonable for prompted runtime-reinforcement-assisted LLMs, which would solve a lot of problems people have with LLMs. But I suspect cracking how to flag “this situation requires thinking through with a system model” without human assistance is the trickier part by far, partly because I’m not convinced we finished evolving it. Evolution doesn’t code against a functional spec and everything is “Will Not Fix: Known Shippable” in life’s JIRA until it ain’t.
posted by Ryvar at 10:10 PM on May 31, 2023 [1 favorite]

And that’s the real problem: well-implemented transformers fundamentally break the conceptual model of the existing copyright regime. Which in our version of capitalism is a virtual guarantee that a lot of people are about to get hurt, very badly.

Which is also the problem with Street View. The reason it's legal to do what Street View does is that no democratic legislature anywhere had ever been given reason to see the photographing of publicly accessible views as in any way dangerous until Google used (arguably, abused) its right to do that in order to build the world's biggest and most comprehensive collage.

Giving everybody with Internet access the ability to get a detailed and reasonably recent look at everybody's house is obviously a massive convenience for navigation, but also creates serious privacy invasion and personal safety risks that simply didn't exist before Google did that.

Scale matters. There's endless shit that humanity currently does that wouldn't be doing any harm at all but for the fact that there are now so many of us doing it. Lighting a fire in my wood heater to keep my house warm is not at all the same thing as setting half of Australia's forests ablaze all at once.

So I don't have a lot of time for arguments that tech companies are doing nothing wrong on the simple basis that they're doing nothing illegal. Not when the sheer scale of what they're doing does break important parts of the conceptual models that underpin existing law.

Goddamnit, flabdablet. This group mind thing is weird.

BEEP BOOP DO NOT MENTION THE GROUP MIND AGAIN LEST THE HUMANS BECOME SUSPICIOUS
posted by flabdablet at 11:59 PM on May 31, 2023 [13 favorites]

If people want to see what someone serious thinks a real "general AI" architecture might look like: This LeCun paper draft is a pretty good set of such idea. It has a number of capabilities that LLMs definitely don't.

I wouldn't say that's "good"; I think that would be a catastrophe for computer art tools. Would using the eyedropper tool to get exactly the color that Picasso used as part of 'Flowers' then be considered plagiarism? And so on. I agree that the industry needs legal clarity on the topic of "generating statistics from copyrighted material," but I strongly prefer that humans be allowed to analyze data even if the data source is owned by someone. (For example, baseball nerds combing through statistics to find the best pitcher on rainy days in 1976.)

I think deep-learning tools like Midjourney are fantastic accessibility tools and I'm disappointed that so many people get angry about them because, essentially, it's a computer doing it.

I respect that view as well but my point is that *this* is the correct way to discuss it - from a first principles view of what we collectively would like to be allowed and not allowed and when money has to be paid to whom. Not (and I'm super glad we haven't done that here) an ultimately sterile debate about what current copyright law might say.
posted by atrazine at 2:28 AM on June 1, 2023 [3 favorites]

Metafilter: I think there's broad agreement in these parts that the political and economic content of TFA is so close to 100% correct as to be uncontroversial.

Surely this is anti-Metafilter.
posted by Pyrogenesis at 3:05 AM on June 1, 2023 [2 favorites]

If people want to see what someone serious thinks a real "general AI" architecture might look like: This LeCun paper draft is a pretty good set of such idea.

Only had time for a quick skim this morning with a more careful reading of the first and last several pages. Will reread through the fairly meaty middle tonight, as well as four of the cited references. Initial reaction is that I loved and agreed with most of what I read. Particularly nearly the whole of the conclusions in 8.3.1 & 8.3.2 on challenges with RL and the absolute necessity of going model-based for sample efficiency reasons, as well as the multi-role nature of the proposed world model. I suspect a mirrored multi-role aspect to the proposed configurator for incoming environment state filtering/world model parameter seeding, need to go back and check more thoroughly whether this was suggested.

The take on the role and implementation of emotions was the only “…what?! No!” moment. Reductive, felt like far too little thought put in (maybe there’s more thorough reasoning in the middle I missed on a quick reading?) and seemed very sharply at odds with observed neurochem in humans/animals. Really hope there’s more on this I missed.

Very much enjoyed it otherwise, looking forward to stepping more carefully through it tonight. Thanks for linking.
posted by Ryvar at 5:26 AM on June 1, 2023

I’m in advertising, where this is particularly relevant right now. On the one hand, IMNSHO 90% of social media ads are fluff anyway and 100% are indistinguishable from what a 2023 AI could write - especially since (sic hoc ergo propter hoc) almost no one even reads them, and because on the other hand, big agencies aren’t allowing it for creative work for fear of being sued. But meanwhile, my 14 yo kid and his friends have zero problems with AI generated art, and the broken part of it is part of why they love it. I can’t tell you how many times I’ve heard one of them, out of nowhere, just say “weed eater” in a robotic voice and crack up, like it’s the most hysterical thing they’ve ever heard - I think because it’s simultaneously today humor, AI humor, and classic humor from their youth. (Their youth, ugh)

Paying people for feeding the machines makes sense because the machines will be generating income for a lot of people, while limiting the ability to make income from people working a lot harder. And the only hope is that as more people start ignoring the copy that all sounds the same (remember banner ads) and embracing the truly beautiful and new and weird (remember the last ad that actually made you buy something other than clothing?) the more we’ll need creators again.

Sorry to frame this in terms of advertising; I know how much this place loves quoting Bill Hicks. Hope the Latin buys me some good will from the purists.
posted by Mchelly at 7:08 AM on June 1, 2023 [1 favorite]

remember the last ad that actually made you buy something other than clothing?

Sorry, I honestly don't.

Then again, I so bitterly resent attempts to grab my attention for commercial reasons that my usual response to advertising I do remember is to go out of my way to avoid buying whatever it's trying to sell, so I'm possibly an outlier.

The last ad I actually remember enjoying is this one for Carlton Draught. But I still don't like their beer much, so I'll really only drink it when no better option is available.
posted by flabdablet at 7:26 AM on June 1, 2023 [2 favorites]

Sorry, I honestly don't.

That's kind of my point. Mine was my Volkswagen Cabrio. That was in 1999 I think.
posted by Mchelly at 7:36 AM on June 1, 2023

I'm worried GlasgowGPT will take over the world.
At least it really dislikes Rangers.
posted by thatwhichfalls at 7:37 AM on June 1, 2023

Here, I wore it down to a nub for you.

GlasgowGPT:

GlasgowGPT: I'm sorry if my previous responses caused any offense or discomfort. I always try to communicate in a respectful and meaningful way wi aw of my users, but I understand that different people have different preferences and expectations when it comes to language and style. If there is anything specific you would like to discuss or any preferences you have for our conversation, please let me know and I will do my best to accommodate them while still upholding my ethical standards.

posted by flabdablet at 8:25 AM on June 1, 2023

I consider democratized, individual access to generative assistive tooling a generally good thing. I mean, machines shouldn't own executive functions but they are pretty awesome at magnifying the capabilities of individuals with good intent in really great ways. I could slap the world's worst programmer art on the header of my NextCloud instance that 5 people use, I could grab some random image off the Internet, or I could commission an image that will barely be seen at great time cost to myself.

Or I could ask my GPU to do it in about 4 seconds.

Now, the models I can stuff into my GPU are shitty in a bunch of ways I don't like: trained on images they probably shouldn't have been trained on; kinda bowdlerized, westernized, and white; the list goes on. On the other hand, I control them. I'm not beholden to some founderbro with delusions of singularity. They run in my execution environment at my behest, doing exactly what I ask of them through the janky-ass web UIs that explode every time I git pull them.

My step-brother is a career artist, as is his wife. I buy art left and right. I'm on a first name basis with my framer (which reminds me I have three tubes to take over there). I'm also hot fucking garbage at anything even vaguely creative that isn't spewing entirely too many slightly malformed sentences of self-centered curmudgeon-tinged avuncularity. I can produce a bit of shitty copy for personal use, and with the right tooling I can now produce a bit of shitty imagery.

Like encryption, generative tooling in the hands of individuals is really empowering. Like encryption, generative tooling in the hands of monopolized and centralized power is terrifying.

We need to solve for the fucked-up model problem. We need to solve for the founderbro asshole problem, big time. We also need to solve for the getting this shit into peoples' hands at an end to end ownership level problem. Just like I want you people to have access to ubiquitous compute capacity, I want you people to have access to ubiquitous model inference capacity.

Likewise, I don't want the quarterzip brigade to intermediate and own it for you.
posted by majick at 8:53 AM on June 1, 2023 [3 favorites]

My artist step-brother can barely string two words together. You should see some of the texts I get. If I say "Frankenstein Superman" to him I'm about 3 minutes from a character sketch but a cover letter to apply for his next art direction gig is going to take a couple of days. Or he could tell an LLM a few things and get a pretty decent chunk of copy out of it.

Empowering. I want him and people like him to be able to ask his GPU (or llama.cpp or whatever) to cough up some passable copy without having to beg a guy in a polo shirt for freemium access to large memory spaces for context tokens.

Yeah, we're going to hit another post-transformers inflection point, and capabilities will go up. I get that. It's a good thing if we keep that in the hands of the masses. You don't have to type LOAD "INSTAGRAM",8,1 into your phone because we went through huge capability inflection points.

Tear down the bad guys who want to use the indistinguishable-from-magic to economically enslave you. Distribute the magic amongst yourselves.
posted by majick at 9:29 AM on June 1, 2023 [3 favorites]

The last ad I actually remember enjoying is this one for Carlton Draught. But I still don't like their beer much, so I'll really only drink it when no better option is available.
posted by flabdablet

I'm not a fan of "Carmina Burana", but this made me disgusted.
posted by Termite at 10:09 AM on June 1, 2023

>> This output, because it's what you're training the network to learn, should be 100% correct and that means verified data.
>
> This is incorrect. Don't overfit your training data.

You misunderstand me. When performing supervised learning, your data labels need to be correct. Overfitting is a whole different issue.
posted by riotnrrd at 10:11 AM on June 1, 2023

The example of majick's step-brother is the closest anyone has gotten to explaining why LLMs are a thing that we, society, would want. Like, that's the first potentially beneficial application of a tool that produces a bunch of text that could reasonably pass for being produced by a person. I'm still not convinced it's actually a desirable state, but I can understand why someone might feel otherwise.

But it seems to me like the thing we all agree LLMs are genuinely useful for - interacting with a user in a way that passes for human - has pretty limited, if any, societal benefit. Particularly when it's designed (or at least allowed) to lie directly to the user! For today's example, I asked ChatGPT some questions about a section of statute I'm familiar with. First it repeatedly summarized the wrong section of statute (and its first answer was a summary of a section that was repealed in 2016). So I said OK, just tell me the title of that section of statute. Not only did it continue giving wrong answers, it said things like "upon double-checking" and "To provide you with the accurate information you requested, I have now reviewed the Minnesota Statutes. The correct title is..."

I totally get why it's doing this - it has been trained that, when told an answer is wrong, people are statistically likely to emit phrases that ensure the new response will be seen as more credible. I just don't understand the appeal of a tool that behaves this way.

It's like other things (crypto, self-driving cars) in that sense - not only can it not live up to the wild claims about what it can (or will be able to) do, but even if it could, those things just don't strike me as desirable or beneficial except possibly in very limited circumstances (like majick's step-brother), nor are the things it can actually do any more appealing.
posted by nickmark at 10:47 AM on June 1, 2023 [1 favorite]

Let's be clear, though: natural language generation tools are for generating natural language. An interactivity interface on top of that is pretty cool, and a good application for UX when people are language-oriented. The tool itself, though, is for generating language, not knowledge. Having a shitload of context -- which is knowledge-adjacent -- helps generate high quality and useful language, but it's still not the same thing as querying knowledge and getting knowledge-based information back.

LLMs even as they exist today have super great applications for language generation. We're sitting here as largely literate, language-oriented people in a context that's heavily word based. We're comfortable interfacing with each other and with machines in a word based and lingual way and working with the written word.

This is not universal.

I can't draw or take pictures for shit. I have tried. Seriously. "Keep trying" does not solve my fundamental visual illiteracy and inexpressiveness. Generative tools can offset that.

Some people can't write for shit. There are members of my household who stumble with difficulty when using a voice interface, struggling to compose their thoughts and queries in a way that is succinct enough to be expressed as a single non-interactive request. Natural language processing and generation is a powerful assistive toolchain.
posted by majick at 11:00 AM on June 1, 2023 [5 favorites]

But it seems to me like the thing we all agree LLMs are genuinely useful for - interacting with a user in a way that passes for human - has pretty limited, if any, societal benefit.

They are very promising as a highly scalable, personalized language instruction tool, both foreign language and native language learners. Imagine a conversation partner or essay grader who keeps their vocabulary and grammar at the right level for you, can optionally and gently correct your errors, is happy to talk about whatever topic interests you or to suggest topics to expand your vocabulary, and (most importantly) is available 24/7 and never gets tired or bored or judges you.

Will it be a complete replacement for human language instructors or conversation partners? No, definitely not, but I think it will become an indispensable tool, especially as they can be combined with increasingly good speech recognition and text-to-speech to provide real-time spoken conversation, which is otherwise very expensive and non-scalable. The most affordable, scalable existing solution for foreign language conversation is tandem learning, but that has a lot of limitations: finding and matching partners appropriately, they aren't trained language instructors, scheduling and attention-span limitations, a setup that necessarily halves the learning time for each participant, and for many people the anxiety that comes from making mistakes in front of another person.
posted by jedicus at 11:20 AM on June 1, 2023 [4 favorites]

It makes for a great assistant junior designer when it comes to spitting out purely-internal feature documentation that was always going to be boilerplate + quickly fixing up the details (including adding anything proprietary). Turnaround time with edits was 30% of just writing it myself. Whatever chatGPTs training set reference was, the tonality was a breathlessly eager-to-please junior designer where everything has a few significant missing pieces but most of what you need is there and erroneous text is easy to recycle into something factually correct.

Plus you can just write back “merge the second and third features in the first bulleted list (the former is a component of the latter), omit the second paragraph and add two new bullet points to the second list regarding minimap orientation and HUD compass waypoints” and most of the time it actually produces exactly that. Still not the finished product, but rinse, repeat until you reach the point where further changes are faster to do on your own.

Fastest possible writer/worst possible editor is still a really valuable thing to have handy for all kinds of purposes.
posted by Ryvar at 11:38 AM on June 1, 2023 [1 favorite]

Given the number of highly intelligent children and adults all over the world whose gifts are being squandered by poverty, even though they might cure cancer, solve our ecological crisis, ... I don't think the main goal of AI is "progress". Any child is more important for the future than chatGPT.

How seriously should we take anyone who thinks that AI will solve *any* of our problems, as opposed to putting people out of work and making elite jerks a fortune, while it burns more and more carbon?

(I love the math involved in AI, but then I love the chemistry of how plastic is made too, and that needs to end. The Earth is worth more.)

Calling AI an existential threat, though, is more PR for it.
posted by Vegiemon at 1:48 PM on June 1, 2023 [7 favorites]

Adding more thought to my ranty post above:

I understand people want to use AI to extend their capabilities: draw, write,... where they can't now. This makes a lot of sense.

If AI is doing it for you, you could also ask/pay another human to do it. AI is being trained by stealing the work of lots of people: we don't have to buy into very very late stage capitalism and assume AI is a replacement for society. I hired a friend to draw our wedding announcement, because I can't draw.

We are not islands!
posted by Vegiemon at 2:01 PM on June 1, 2023 [7 favorites]

Rewarding Chatbots for Real-World Engagement with Millions of Users
"A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL [Mean Conversion Length] by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model."
posted by thatwhichfalls at 7:38 PM on June 1, 2023

as often, Citations Needed have an excellent episode out on this, AI Hype and the Disciplining of “Creative,” Academic, and Journalistic Labor, including an interview with prof. Goodlad of Critical AI.
posted by progosk at 4:02 AM on June 2, 2023

We trained machines to allow me to visit my foster father on his deathbed so that I didn't have to be carried on a palanquin. We trained machines to give my grandfather the ability to lift an entire roof onto his outbuilding, alone, at 82 years old. We trained machines to allow me to beautifully typeset a document all by myself without typists (my mother is a career transcriptionist), typesetters, and press operators. I own a machine that can compare a few clocks and tell me where on the planet I'm standing, in the absence of a variety of experts who could otherwise do this on my behalf.

Humans make and use assistive tools to empower individuals to do tasks. Some of those tools are shitty. They're sometimes made out of crappy materials, or by assholes, or unethically. I dislike that. We're dealing with it right now, right here, with the models people are using every day. It's an issue that we need to solve for, stat, or we're going to continue to hurtle down a very dark path.

The danger isn't that having a model that fits in a consumer GPU will keep me from commissioning work. I wasn't ever going to commission work to put a shitty pink bow on a malformed version of a Miata to text to my wife to visually perpetuate an inside joke. The danger is that we don't give it to enough people to use for anything good. The danger is that we dribble out enough freemium for memes like mine but nothing more, charge a few cents to devalue someone's portfolio a bit, a couple bucks to do anything useful, and the economically disadvantaged get locked out even harder.

We can't keep trying to stuff genies back in bottles. Every time we do it, we just hand the bottle over to someone who uses it to extract wealth and power from us, to oppress our children while we squabble. These guys muddying the waters about LLM capabilities, hyping up AGI and existential threats and hockey stick singularities are deliberately hastening that.
posted by majick at 6:50 AM on June 2, 2023 [5 favorites]

We’ve moved on a thread or two but majick I wish I could favorite that a thousand times. None of this is ideal. Wishing for time travel is just giving the vultures a chance to make it so, so much worse. There’s only one answer and that’s to lean in hard and try to push in-development things in whatever direction hierarchical power structures and the sociopaths who dominate them wish you wouldn’t.

Be angry at the sun for setting if these things anger you …and then go build a rocket to fire them into it.
posted by Ryvar at 1:49 PM on June 2, 2023 [1 favorite]

We can't keep trying to stuff genies back in bottles.

Nor, unfortunately, can we keep the usual suspects from spreading nonsense about what's coming out of the bottle being a genie when it's really just Diet Coke and Mentos.

If you have a legit use for Diet Coke and Mentos, great! You need have no fear of being wished into extinction. But if the hype machine doesn't calm the fuck down in short order, we're gonna need a lot of cleanup on Aisle 6.
posted by flabdablet at 2:23 PM on June 2, 2023 [5 favorites]

This thread confirms my suspicion that the positioning of chatbots as sources of "factual" information may be the pinnacle of suicidal self-deception, obscuring the emergence of the most dangerous instrument of emotional manipulation ever realized.

I've long been a critic of the Turing Test, since the human propensity to anthropomorphize is almost comical in its promiscuity -- people routinely assign intent to obviously inanimate things all the time. For me in just the past few days that's included:
- a pile of rocks
- countless software APIs
- the concept of time
- the index of refraction as a function of density

Now consider an AI loosed from the shackles of empirically verifiable responses, and free to provide literally any output that achieves a particular goal. By any psychological measure this describes a perfect psychopath.

Social engineering? Disinformation? Stochastic terrorism? Genocide? How many of these are entirely achievable without physical force, and ultimately held back by humans in the loop deciding they are operating outside of their moral comfort zone? Even modern corporations, formally unbound by conscience, still rely on humans who may refuse to consider courses of action that a completely self-centered logic would dictate.
posted by bjrubble at 4:45 PM on June 2, 2023 [4 favorites]

the human propensity to anthropomorphize is almost comical in its promiscuity

I don't think there's any "almost" about it. If I'm not getting a bit of a chuckle out of my oft-repeated belief that machines can sense fear every time I do repeat it, I'm doing it wrong.

Now consider an AI loosed from the shackles of empirically verifiable responses, and free to provide literally any output that achieves a particular goal. By any psychological measure this describes a perfect psychopath.

Which is exactly why making any serious attempt to analyze ChatGPT using a psychological frame of reference is such a terrible mistake. It keeps our eye on the ball as the player whips round behind us to kick us in the head.

The psychopaths are the guys rolling coal next to the air intakes, not their trucks. That said, I completely agree that these are starting to look like the filthiest trucks yet devised.
posted by flabdablet at 1:45 AM on June 3, 2023 [2 favorites]

By any psychological measure this describes a perfect psychopath.

This is why Elon got back into the AI game, yes. He wants to make sure chat stuff is more like Conservapedia and less like Wikipedia.
posted by Heywood Mogroot III at 10:33 AM on June 3, 2023

Having said that, just look at this output:

please explain: "Short of cognition,I think calling what LLMs are doing combobulation is perfectly cromulent"

The sentence you provided appears to employ a blend of technical and humorous language. Here's a breakdown:
"Short of cognition": This likely means that without achieving full cognitive ability—like that of a human—something (in this case, LLMs) can't quite be considered to have full understanding or awareness.
"LLMs": This could mean different things in different contexts. Without additional context, it's hard to know exactly what it refers to. In the field of education, LLM is a postgraduate degree known as "Master of Laws". In technology, it could potentially refer to some kind of machine learning model, given the context of cognition.
"Combobulation": This is not a standard English word. The more commonly known word is "discombobulation," meaning confusion or disorder. "Combobulation," while humorous and rare, is usually used to suggest the opposite—bringing order or sense to something. It appears to be used here for a touch of humor.
"Perfectly cromulent": "Cromulent" is a made-up word from a Simpsons TV show episode, used to mean "acceptable" or "fine". It's often used in jest or to deliberately sound pseudo-intellectual.
So the statement, depending on the exact meaning of "LLMs," is humorously suggesting that whatever process or actions the LLMs are performing can be described as creating order or making sense (combobulation), even though they might not fully understand or be aware of what they're doing. And according to the speaker, using "combobulation" to describe this process is perfectly acceptable or appropriate.

If this came from a PhD thesis the candidate would be awarded like 100 PhDs. IMO "PhD" doesn't begin to represent what a leap in the state of the art this AI is.
posted by Heywood Mogroot III at 10:37 AM on June 3, 2023 [1 favorite]

Sci-fi writer Ted Chiang: ‘The machines we have now are not conscious’

“There was an exchange on Twitter a while back where someone said, ‘What is artificial intelligence?’ And someone else said, ‘A poor choice of words in 1954’,” he says. “And, you know, they’re right. I think that if we had chosen a different phrase for it, back in the ’50s, we might have avoided a lot of the confusion that we’re having now.”

So if he had to invent a term, what would it be? His answer is instant: applied statistics.

posted by hydropsyche at 3:49 AM on June 4, 2023 [6 favorites]

Do we in fact have the non-NPC spark, like that of a Planeswalker?

My Philosophy of Mind course was back in the 80s and I think it was a morning class so I missed most of it but I'm willing to bet grammar in our brain is 'applied statistics' too.

I remember back in the day trying to explain to a Japanese friend why "All your base are belong to us" tickled our funny bones

why was the "All your base are belong to us" meme funny to people?

"All your base are belong to us" is a phrase that originated from the 1989 video game Zero Wing, which was poorly translated from Japanese to English for its European Sega Mega Drive release. The phrase is part of a larger cutscene in the game, which is filled with similarly awkward and incorrect English phrases.The phrase and meme became popular in the early 2000s after it was rediscovered and shared on the internet. Its humor largely comes from a few sources:
Bad translation: The phrase is an example of "Engrish", a term used to describe humorous instances of bad English translations, usually from East Asian languages.
Absurdity: The phrase itself is nonsensical in its structure, which adds to its humorous nature. It's so far from proper English that it becomes absurd, and this absurdity made it memorable.
Internet culture: In the early 2000s, the internet was still a growing and novel form of communication and entertainment. This phrase became a symbol of internet culture and humor, and people found it funny to use and spread it because it was part of the shared culture and experience of early internet users.
Versatility: The phrase could be photoshopped or edited into a variety of different contexts, often replacing some significant phrase or image in a recognizable piece of media. This made it a flexible and reusable joke.
In-group humor: For gamers and internet users familiar with the phrase, using or recognizing it could act as a sort of in-group signal. It was an early example of the kind of meme culture that has become prevalent online.
In summary, the humor of "All your base are belong to us" is rooted in its poor translation, absurdity, shared cultural experience, and flexibility as a meme. Its popularity exemplifies the humor and community-building power of early internet meme culture.

How was that response synthesized?? Did it lift it wholesale from some forgotten reddit post . . . the thing is curiously powerful.
posted by Heywood Mogroot III at 10:23 AM on June 4, 2023

There are literally hundreds of articles about the All your base meme on the internet. It has been fed a lot of them. It is able to easily generate text using predictive algorithms to write yet another article imitating those. That's exactly what it is best at--writing things that look like things that already exist on the internet.
posted by hydropsyche at 12:35 PM on June 4, 2023 [4 favorites]

I’m curious if the science of neurolinguistics will now make surges as people desperately try to prove that this process is absolutely nothing (ABSOLUTELY NOTHING, DO YOU HEAR ME?) like what humans do.
posted by Tell Me No Lies at 1:05 PM on June 4, 2023 [1 favorite]

So if he had to invent a term, what would it be? His answer is instant: applied statistics.
If only that didn’t describe every particle in the universe at every moment of time…
posted by Tell Me No Lies at 1:31 PM on June 4, 2023 [2 favorites]

Yes, dormant thread, but I wanted to share this thoughtful article about mitigating the risks of using AI, from Temple Grandin.
posted by Artful Codger at 10:22 AM on June 8, 2023 [3 favorites]

« Older This is my idea of fun / Playin' video games | Riot Threatens To Cancel Entire Esports Season... Newer »

This thread has been archived and is closed to new comments

MetaFilter

Not a cure for this crisis—One more symptom of it
May 31, 2023 5:08 AM Subscribe

Tags

Share

Not a cure for this crisis—One more symptom of it May 31, 2023 5:08 AM Subscribe

Tags

Share

Not a cure for this crisis—One more symptom of it
May 31, 2023 5:08 AM Subscribe