Thoughtful paper about ChatGPT
December 21, 2023 9:03 AM   Subscribe

This paper describes an old and a new way to think about ChatGPT. Borges and AI is the title; the authors take a high-level view of the entire potential corpus of ChatGPT, guided by a few of Borges' stories that explore universes of infinite possibilities.

This is the best arXiv paper I've read this year. They don't cite much, but they do cite Zellig Harris, Chomsky's advisor. Here's a direct link to fulltext PDF

ChatGPT isn't just a tool, but a new kind of realization of the infinite possibilities of language. Borges' and others' previous explorations of infinite narrative before there was such am instantiation are a useful and entertaining framing for this new kind of thing.
posted by lwxxyyzz (62 comments total) 41 users marked this as a favorite
 
Ctrl-F Pierre Menard -- zero results Well, alright then...

... How does one write an essay about Borges and AI and not mention Pierre Menard?
posted by I-Write-Essays at 9:06 AM on December 21, 2023 [13 favorites]


I will click through presently, but the description made me think of a Douglas Hofstadter book pitch à la GEB. That said, big Borges fan here, so here goes. Thanks.
posted by the sobsister at 9:12 AM on December 21, 2023 [1 favorite]


This is like catnip to me, although I take issue with a few points. "The ability to recognize the demands of a narrative is a flavour of knowledge distinct from the truth. Although the machine must know what makes sense in the world of the developing story, what is true in the world of the story need not be true in our world." But this is not really true. Our sense of what can be true in a story is indeed always anchored in what can be true in the real world. Most of our thinking is about telling stories to ourselves and seeing how we feel about them; our thinking is embodied, is about bodies. Forking paths are clipped or consolidated based on physical plausibility; possibilities are pre-foreclosed, and it is not the case that, "when creating a story, all branches are considered at the same time."

To take one of the examples--asking where Jack can find illegal stuff--if the LLM answers you with "rabbit rabbit rabbit rabbit" then you know something is broken. The "demands of a narrative" are not being met, and the developer has to go back and do some adjusting based on the developer's understanding of narrative which is in turn based on the developer's understanding of having a body moving through a world of cause and effect. This seems about as far away from the Library of Babel as you could get, because trillions and trillions of its volumes have been tossed into the fire before you even get started.
posted by mittens at 9:38 AM on December 21, 2023 [5 favorites]


Here's the ar5iv link which renders arxiv PDFs into HTML.
posted by vacapinta at 9:38 AM on December 21, 2023 [5 favorites]


Understanding weather patterns through the moods of the gods only goes so far.
That's a strong start!
posted by flabdablet at 9:39 AM on December 21, 2023 [2 favorites]


excellent pun in the title of this paper
posted by chavenet at 9:40 AM on December 21, 2023 [4 favorites]


Kind of a weak finish though.
posted by flabdablet at 9:50 AM on December 21, 2023


posted by flabdablet

Oh, hey! I was just on your user page...

Because halfway in, already I find myself thinking "man, everyone really ought to read flabdablet's series of comments about LLMs vs the ultimate Markov before reading this paper."
posted by Ryvar at 9:56 AM on December 21, 2023 [13 favorites]


The authors are both very well respected ML researchers. Not every paper like this comes from people with such deep domain expertise.
posted by constraint at 10:01 AM on December 21, 2023 [2 favorites]


Backatcha, Ryvar. Everyone really ought to read you on this as well.
posted by flabdablet at 10:10 AM on December 21, 2023 [3 favorites]


The first thing I did with ChatGPT was to convince it to simulate text adventures, so this take on LLMs seems very natural to me. The subjective feeling that interacting with these gave me was of entering a tiny snowglobe representation of a story or setting that you can poke around from the inside. Very weird, and interesting, and far more interesting than believed by people Online who insist that LLMs are just plagiarism machines for losers.
posted by BungaDunga at 10:17 AM on December 21, 2023 [3 favorites]


Great perspective:

Some fear the fiction machine as an omniscient artificial intelligence that may outlive us; however, the darker temptation is to surrender our thoughts to this modern Pythia, impervious to truth and intention, yet manipulable by others. If we persistently mistake the fiction machine for an artificial intelligence that can spare us the burden of thinking, the endless chatter of the language models will make us as insane as the struggling Librarians.
posted by airing nerdy laundry at 10:19 AM on December 21, 2023 [3 favorites]


> I find myself thinking "man, everyone really ought to read flabdablet's series of comments about LLMs vs the ultimate Markov before reading this paper."

Which I did go look at those, and in the first link encountered this bit
The point is that these networks are not just "after word X comes word Y with a certain probability". That was 1970s AI Markov chains. Instead, they have what appears to be a model of the reality described implicitly by the text they have consumed in them.

The point is that these networks are exactly "after words X1 ... Xm comes word Y1 / Y2 / ... / Yn with certain probabilities". What distinguishes them from 1970s Markov chains is the specific technique employed to look up the probabilities.
I think there are a lot of people confused about this, that LLMs "have what appears to be a model of the reality described implicitly by the text they have consumed in them." Or that maybe people hope that some day some such model will emerge, if the training corpus can be made big enough. I'm with @Ryvar, the people seeing that are basically recognizing faces in clouds. I would think that a little introspection on what a shitty medium text is for describing reality, and how bad it is at encoding mind-state, would reveal that just piling up more of it won't address those defects.
posted by Aardvark Cheeselog at 11:05 AM on December 21, 2023 [12 favorites]


the fiction machine

Yes
posted by j_curiouser at 11:12 AM on December 21, 2023 [1 favorite]


I think there are a lot of people confused about this, that LLMs "have what appears to be a model of the reality described implicitly by the text they have consumed in them." Or that maybe people hope that some day some such model will emerge, if the training corpus can be made big enough. I'm with @Ryvar, the people seeing that are basically recognizing faces in clouds. I would think that a little introspection on what a shitty medium text is for describing reality, and how bad it is at encoding mind-state, would reveal that just piling up more of it won't address those defects.

The last time I got into it about "AI" garbage on here somebody asked me why it was so important to me to explain to people that what is being called "AI" has nothing whatsoever in common with artificial intelligence and over and over and over again that importance is validated.
posted by Pope Guilty at 11:15 AM on December 21, 2023 [9 favorites]


A model of the reality described implicitly by the text…

“Implicitly” seems to describe an ability to read between the lines. To draw in knowledge beyond the text or to be able to analyze the text semantically and then synthesize new text, as in the reader’s mind, again beyond the given text as this new text is an extension to that text, an addition. Again, people are projecting human qualities onto computer code. And most of these people seem to be those making and using these LLMs as is, without actually questioning what is really going on. As Pope Guilty says, every time LLMs are discussed, somebody has to once more attempt to quiet down the science fiction exuberance for them, by giving the model of that realty and explicitly describing what it is going on with the text. My question is… Why do these people need to believe in machine sentience, what need is being fulfilled by this belief, is this some form of religious belief in some new messiah that will now explain the exceedingly complex reality we live within and which seems to be beyond our own intellectual grasp?
posted by njohnson23 at 11:46 AM on December 21, 2023 [2 favorites]


think there are a lot of people confused about this, that LLMs "have what appears to be a model of the reality described implicitly by the text they have consumed in them."

For what it's worth I think the quote you're disagreeing with, that LLMs build a model of reality, is closer to the consensus of computer scientists studying LLMs than not.

To say it in technical terms: the computer scientists I've talked to believe that the objective function during training, of accurately predicting the next series of tokens with less and less storage per prediction, seems to cause LLMs as universal function approximators to develop models (in the sense of simulations) (however imperfect) of the underlying processes that created that series of tokens. It's not yet clear how good they can be at simulating the processes creating their training data, but there also aren't yet known ceilings on how good they can be at it, beyond the limitations of what knowledge is contained in the training data and how much storage capacity the model has.

Or going back to less technical terms: my understanding of the mainstream CS view as of now is that it's reasonable to treat LLMs as automatically generating executable simulations of the processes that created their training data, to the extent those processes are revealed by the training data and simulatable in the model's time and space budget. We might learn at some point that this overstates what they're capable of, but people are looking for those limits and haven't found them yet.

(I'm not speaking for anyone in particular here, but I'm thinking of the work of people like neuroscientist Sarah Schwettmann, whose research explores the circuits created by LLMs for different purposes during training.)
posted by john hadron collider at 11:59 AM on December 21, 2023 [10 favorites]


Being able to google stuff has made me a lot smarter I guess, though wikipedia is doing the heavy lifting here now. Being able to ask a machine any question and get a correct answer would be pretty spiffy.

We're not quite there yet tho:
You
what was the last country to declare war on Germany in ww2?

ChatGPT
The last country to declare war on Germany during World War II was Brazil.
Brazil declared war on Germany on August 22, 1942
posted by torokunai at 12:00 PM on December 21, 2023


My question is… Why do these people need to believe in machine sentience, what need is being fulfilled by this belief,

Maybe they perceive machine sentience as a simpler explanation- in that it is easier to hold in one's head, and maybe a little easier to swallow- than "a probabilistic machine has falsely given you the impression of its sentience".

To say it in technical terms: the computer scientists I've talked to believe that the objective function during training, of accurately predicting the next series of tokens with less and less storage per prediction, seems to cause LLMs as universal function approximators to develop models (in the sense of simulations) (however imperfect) of the underlying processes that created that series of tokens.

It's like Box said- all models are wrong, but some models are useful. It is potentially useful to see an LLM's internal representation as a kind of world-model (a world-model which is itself wrong but useful... for predicting the next token).

To be maybe a little pedantic (a risk when I'm not in the relevant specialty, but Muphry's law be damned!), I don't think the mere fact of being a universal function approximator would drive LLMs towards simulating the processes that created their training data. If I'm remembering correctly, random forests are also universal function approximators, but no one (?) would think of them as performing any kind of simulation- they just memorize. The paper on grokking effectively argues that models move away from memorization when
  1. they have deep paths available that have the potential to encode relationships more concisely than wide, shallow memorizing paths
  2. their objective function includes a regularization penalty that can drive the model towards explaining more with fewer parameters, so the model actually takes advantage of that potential
posted by a faded photo of their beloved at 12:27 PM on December 21, 2023 [6 favorites]


automatically generating executable simulations of the processes that created their training data

So if an LLM scanned the text of Moby Dick, it would be able to simulate the thought processes of Melville as he wrote the book? Including all the decisions he made about plot development, characterizations, diction, etc.? All this from just the text? And what exactly is meant by “simulatable in the model’s time and space budget?” What is it simulating? Melville’s thoughts, his mind? Do those people looking for these limits know what they are looking for? As described, it would appear that LLMs are necromancers plumbing the depths of long gone people’s minds.
posted by njohnson23 at 12:33 PM on December 21, 2023 [8 favorites]


Or going back to less technical terms: my understanding of the mainstream CS view as of now is that it's reasonable to treat LLMs as automatically generating executable simulations of the processes that created their training data, to the extent those processes are revealed by the training data and simulatable in the model's time and space budget. We might learn at some point that this overstates what they're capable of, but people are looking for those limits and haven't found them yet.
I think people who seriously believe that LLMs are building internal representations of reality that are usable in situations where people might get hurt by their innacuracies and incompleteness... I'm having a hard time completing this sentence. I guess I think, they grossly overestimate how much work is being done by the bit I bolded in my quote. If that's what mainstream "serious CS-department AI" people think, it's pretty sad. You would think they'd have learned more from the examples of their predecessors, who thought that chess-playing or theorem-proving would be the tasks for intelligent machines.
posted by Aardvark Cheeselog at 12:34 PM on December 21, 2023 [7 favorites]


Anyway, now that I have actually read TFA all the way through.

Fiction machines. Exactly. It's a more neutrally descriptive term than "automated bullshit generators."

People talk about LLM behavior as though the model is running along reasonably for a while and then "starts hallucinating." But it is not so. There is not something qualitatively different that the model is doing in the two cases. It's just that the fiction tends to be more plausible when the training corpus is denser. MeFi's own @cstross is too uninterested in driving traffic to his blog to point from here to there, so you might have missed his recent helpful illustration of what happens with an LLM that has just enough data in its training corpus to not "hallucinate" from the get-go.
As fiction machines, however, their stories can enrich our lives, help us revisit
the past, understand the present, or even catch a glimpse of the future. We may
have to design more mundane verification machines to check these stories against
the cold reality of the train timetables and other unavoidable contingencies
of our world. Whether there is a middle ground between these two kinds of
machines, or whether alignment techniques can transmute one into the other,
remains to be seen.
Exactly. I would not bet a lot of money that the alignment techniques will pan out.
posted by Aardvark Cheeselog at 12:49 PM on December 21, 2023 [4 favorites]


My question is… Why do these people need to believe in machine sentience, what need is being fulfilled by this belief, is this some form of religious belief in some new messiah that will now explain the exceedingly complex reality we live within and which seems to be beyond our own intellectual grasp?

Without going too far offtopic, I kind of think science fiction itself already answered this question. Deus Ex called it, back in 2000: “God was a dream of good government. You will soon have your god, and you will make it with your own hands.

We are manifestly incapable of governing ourselves without horrific corruption, abuse and eventually genocide or near-as emerging, in any socioeconomic or political system. Our narcissistic sociopaths seek the levers of power, and when any of them succeeds, the first use is to begin shaping the environment of governance into one that allows them to maintain their status. This inevitably biases all systems of governance to, over the longterm, streamline the rise of future sociopaths into positions of increasingly centralized power. Douglas Adams wrote it better and infinitely more amusing, but it amounts to the same thing: people are a problem.

I think most intelligent people recognize at some level that human history shares a lot with Sisyphus. We’ve been desperately searching for an exit for a long time, caught between hope and despair: the Internet wasn’t it, but maybe the next big thing will be.

it's reasonable to treat LLMs as automatically generating executable simulations of the processes that created their training data, to the extent those processes are revealed by the training data and simulatable in the model's time and space budget

This is a perfectly understandable misread, IMO. The biggest factor missing is that humans generate linguistic structures reactively within a dynamic environment. LLMs train on a crystalized snapshot of the consensus language products arising from that process. Some representations of logical or topical structures are embedded directly in the words and even phonemes we use to communicate, some structures are embedded in how the words are used. Some implicit understandings are so obvious in paired sentence contrast - think of it as the linguistic equivalent to edge detecting an image - that they will inevitably be represented in the trained network’s topology.

All of this gives rise to an incredibly intricate model of How Humans Use Language To Filter their environment and negotiate survival within it and with each other (survival here includes things like writing novels for food/personal accomplishment, or posting to Metafilter to maintain our sanity). LLMs don’t have that runtime component, nor do they adapt based on their interactions with their environment, except as transient context windows that grow with each subsequent query in a session - it doesn’t fundamentally alter the model.

The reason that multi-modality is now the big new thing in LLMs is that cross-training those statistical representations of language with corresponding visual and auditory ones results in vastly superior static (crystallized) models: inferring not just from the structure of our language but our art and other representations of our perceptual environment.

The next step, the bordering on impossible one, was - I thought - going to be using all of that “How Humans Perceive And Filter Everything” to seed runtime models for general case problem solving and simulation. The little map we build in our heads showing all the best routes through New York City at 5PM on a Friday, with little red lines where the traffic jams are guaranteed to appear, and what that means for us in terms of how we as primates piloting multi-ton metal boxes should move our hands and feet to reach the hospital in time.

OpenAI appears to be trying to solve this by spooling up a billion or however many semi-randomized reinforcement networks and progressively culling the worst performers. Something vaguely analogous to just beamsearching each problem with runtime instanced models, insane as that may sound (Step by Step Verification and the other Q* papers). And I would’ve said “well that’s just stupid in addition to being incredibly wasteful” right up until nVidia demonstrated LLMs authoring reinforcement scoring code significantly better than human experts (Eureka demo). Now: I am far, far less certain it won’t work. Won’t be AGI, but it might give rise to something close enough to human-like chain of reasoning that we won’t care to distinguish the two.
posted by Ryvar at 1:06 PM on December 21, 2023 [5 favorites]


I think people who seriously believe that LLMs are building internal representations of reality that are usable in situations where people might get hurt by their innacuracies and incompleteness

It seems like a fairer statement would be that the idea that they don’t really model anything, to the extent that it has currency, is probably incorrect, based on examination of toy examples? Which is rather short of supporting that a model trained on a sufficiently large set of random text from the internet can model everything!

I wouldn’t bet any money on the position that even the current LLM paradigm is fundamentally unusable for important things - beating human reliability for specific use cases is not such an impossible bar - but again that’s very different from asserting that it’s usable for everything.
posted by atoxyl at 1:32 PM on December 21, 2023 [1 favorite]


Speaking of human reason, there is a novel, the product of one person’s reasoning and other mental processes, Finnegans Wake by James Joyce, that I really now wonder what an LLM would do with this as training data. It is purported to be English, but an English unleashed from the normal English encountered in books. It is not gibberish, instead it is a very complex construction, art in the extreme as artifice. Most readers balk at it. There are numerous guides written to help readers fathom its depths. Joyce saw it as a book about everything it seems. And it is thought to be a transcript, so to speak, of a dream or dreams. If anyone is curious, here is the text. I put this here because most of the text implied in these discussions is what I might call straightforward text. But there is much more than that out there.

Question: Does an encyclopedia volume know that Paris in the capital of France?
posted by njohnson23 at 1:39 PM on December 21, 2023 [1 favorite]


Does an encyclopedia volume know that Paris in the capital of France?

Yes!
posted by mittens at 2:01 PM on December 21, 2023


I would like to understand your definition of “know” in that case. Does a box of breakfast cereal know how much sugar is in a serving? Does it know what a serving is? Does it know when the box is empty?
posted by njohnson23 at 2:05 PM on December 21, 2023


I think people who seriously believe that LLMs are building internal representations of reality that are usable in situations where people might get hurt by their innacuracies and incompleteness... I'm having a hard time completing this sentence. I guess I think, they grossly overestimate how much work is being done by the bit I bolded in my quote. If that's what mainstream "serious CS-department AI" people think, it's pretty sad.

I think, instead of thinking of research scientists as foolish or sad, think of them as comfortable engaging with imperfect simulations — and, typically, being very concerned about how people in general will be affected by engaging with imperfect simulations, and wanting to understand both the simulations and the social processes better.

There is nothing inherently wrong with having access to imperfect simulations! It's useful to have a merely OK weather report, or a GPS that often but not always knows the fastest way to drive home, or a medical test that guesses correctly often enough to make people's lives better. Using a simulation wrong can backfire, and maybe you'll regret ever seeing it, but often it's overall beneficial to have it.

And likewise: lots of people see value in a latest-gen AI weather forecast that is more accurate than previous ones, even though AI weather forecasts have many of the faults we're talking about here.

And lots of people see value in systems that are better at translating text than we've had before (like the new Firefox translate button that can translate a web page locally instead of sharing your data with Google, thanks to LLMs), or better at captioning images, or better at redacting confidential information from public data sets -- even if inaccurate, even if harmful when they fail, they can be much better than an alternative which is also inaccurate and harmful. There are lots of not very flashy, but very useful, things you can do by simulating what a person would do in a given situation, even if the simulation is no more like a person than a weather report is like a hurricane.

My experience is the closer researchers are to testing these simulations, the less confident they are either in how good they can eventually be or how useful it will be to have them. I think they're right not to be confident.

What is it simulating? Melville’s thoughts, his mind?

This is a good question, and why I think "simulation" is such a useful frame. Does a weather report simulate raindrops and the feel of wind on your face? What does it simulate about a hurricane, in fact? Nothing of the hurricane's essence, but a lot that's useful.

An LLM simulates ... whatever is the most compact way it can come up with during training to simulate a process that generates the second half of a Moby Dick chapter from the first half of a Moby Dick chapter, more or less. In all cases that will be more like a weather report than a hurricane; the full text of the internet doesn't have a copy of Herman Melville's brain or anything like it! But some LLMs will end up just memorizing and regurgitating Moby Dick like a giant Markov chain, and they'll be useless; and some will end up creating pretty useful "weather reports" that can do things like answer questions in modern english about Melville-contemporary documents retrieved from a database, and those ones will be handy to have around.

When the simulations that work, work, it will be useful to have a mental model of why they work -- and that's where I recommend thinking in terms of LLMs having a usefully imperfect world model they derive by trying to predict everything in their training data, because I think it'll be most explanatory of what you see come out.
posted by john hadron collider at 2:05 PM on December 21, 2023 [5 favorites]


A simulation is, definitionally, a contextually embedded imitation of a process. LLMs do not imitate process - they encode statistical representations of word order and frequency that, with a large enough training set, contains a conceptual-relationship map shared by humans who speak a given language. This is not process or emulation of process but storage and description of the outcomes of process.

The encyclopedia volume does not know that Paris is in France because the act of knowing requires an active process. The encyclopedia volume contains a representation of the relationship between Paris and France: a fact which an active process - a human reader - can embed in contexts outside that representation.

There is a different type of neural network training style for imitating processes and that is reinforcement learning. Simplest example would be a neural network trained to solve mazes in the shortest possible path. LLMs employ reinforcement learning as part of their training process, but it is not part of the inference process (prompt -> output).

(Caveat: I am an enthusiast, not a professional or academic in the field, all of this is as I understand it and potentially mistaken.)
posted by Ryvar at 2:27 PM on December 21, 2023 [8 favorites]


excellent FFFP lwxxyyzz.
posted by clavdivs at 2:42 PM on December 21, 2023 [2 favorites]


This is not process or emulation of process but storage and description of the outcomes of process.

I am not in the field, either, but haven’t there been a few papers now that look at small-scale examples and conclude that GPT-style models can develop circuits that execute some definable process - validating Othello game states, or computing a mathematical function - in a way that we can interpret? I thought that’s what people were getting at in saying that there’s a growing belief that they do have “world models” although again as far as I know the evidence comes from fairly trivial examples.
posted by atoxyl at 2:42 PM on December 21, 2023 [1 favorite]


Why do these people need to believe in machine sentience

Some of them think that if you use enough reinforcement learning to teach a computer to mimic a human, eventually you reach a point where the mimicry is so good that the only algorithm capable of producing it would be one that does most of the same things a human brain does.

If you could somehow write out an algorithm that describes everything a human brain does, and then randomly generate algorithms of the same size, eventually you'd get a match. But you'd need a huge amount of time. Or a really fast way to generate algorithms. Or some kind of selection process to narrow down how many algorithms you need to try.

Believers in machine sentience think we'll eventually have enough processor power and a sufficiently good selection process to stumble upon an algorithm that's doing the same thing a human brain does—or more likely some equivalent that is close enough.

The reason the goalposts keep moving (a computer will never...be a Chess master, master Go, generate text that passes Turing tests, drive a car) is that we keep realizing there are ways for computers to do these things without evolving an algorithm remotely like a human brain, with algorithms that are obviously much, much simpler than a human brain. What task could a computer definitely not do without an algorithm that is equivalent to a human brain?
posted by straight at 2:52 PM on December 21, 2023 [1 favorite]


Starting with the assumption that "to know" is really the same verb as "to imagine" or "to sense" or "to feel" or "to remember"--that is, the brain is doing the same activity in all cases--then there's a way to think about this along two axes, a sort of location-based one and a sort of biology-based one. Does an eye see? Well, not really--seeing is something done in the brain--but colloquially we understand what it means to say an eye can see. It's a necessary part of the process. If we had an artificial eye, a brain-connected camera, could we still say, again, colloquially, that the artificial eye could see? I think that would be okay to say, because we understand the organ and its function in relationship to the body.

Socially-mediated knowledge like Paris' relationship to France is just a way of putting sentences together; I don't know Paris is the capital of France in any real way--there's no Paris-focused neuron in my brain--I'm only able to bring a sentence to mind about it, and that ability is knowing. I once knew the capital of Kenya, but I can't perform a sentence about it as I write this. Off to my left is an atlas, so I could look it up. The atlas is like that artificial eye, I can use it as though it were a memory; I can use it to generate the sentence about Kenya.

Does the encyclopedia know the capital of France, in the sense that it is a brain that can generate a sentence about Paris? No. Is it an artificial organ that I can plug into my brain to generate a sentence about Paris, and so can colloquially be thought of as serving the function of that organ? I think so? Otherwise, what is communication supposed to be transmitting?
posted by mittens at 2:55 PM on December 21, 2023 [3 favorites]


I tried checking whether ChatGPT understands Averroes' search. "cultural differences" are what it says, even after being asked whether there were theaters in Harun Al Rashid's Baghdad.
posted by lwxxyyzz at 3:03 PM on December 21, 2023 [1 favorite]


Really enjoyed this.

Oh my Bingo Card but unplayed so far:
* Engineers Not Knowing Philosophy
* Engineers Not Knowing Art
* ...and so mandated college courses in other ways to think
* How-you-think-about-a-problem shapes what-you-imagine-can-solve-the-problem
* The model is not the thing
* The map is a way to find the thing, but it's only a reduced set of references to your route along the real journey you must take
* Inverting the Turing Test so it's not about "does this pass as human?" but "what does it need to do so you treat it as human?" because it's a mediation worth having: what do people around me need to do so that I treat them as human?

I think, though, this is an excellent FPP.
posted by k3ninho at 3:05 PM on December 21, 2023 [4 favorites]


I thought that’s what people were getting at in saying that there’s a growing belief that they do have “world models”

Right. They contain a compressed, fixed representation of so much Othello or English that nearly any reasonable input is going to yield plausible Othello/English output. The structural underpinnings of those subjects are embedded in the topology of the network. However, those structures are always fixed, gleaned from examples within a now-absent context. Hit that network with input from a novel context or one that was simply absent from the training set, and things likely go off the rails almost immediately. Part of what makes ChatGPT as seemingly robust as it is, is that the sheer volume of sentences covers the entire spectrum of contexts most humans are likely to be in - both physically and intellectually. But there is still no runtime component - no adaptation and at best very little synthesis with previously-unrelated knowledge (on a large enough network there’s always some cross-domain connections, backprop is to minimum *local* loss).

But you'd need a huge amount of time. Or a really fast way to generate algorithms. Or some kind of selection process to narrow down how many algorithms you need to try.

The first two sentences are basically what OpenAI appears to be attempting with Q* in an ugly brute-force fashion. The last sentence is what nVidia proved LLMs can already consistently do much better than the best humans when it comes to scoring bespoke reinforcement-based systems, and is why that was such a “holy shit” moment for a lot of us enthusiasts (and judging from the reaction also the professionals and academics). Calling your demo Eureka is a hell of a claim, the kind where you really better deliver the goods. And they fucking did.

posted by Ryvar at 3:18 PM on December 21, 2023 [2 favorites]


Here's the blog post in which an LLM-like model trained only on Othello move lists develops an internal representation which can be queried for board state: Do Large Language Models learn world models or just surface statistics? In short, the position and current color of pieces on the board can be queried, even though the model is only ever fed move lists and asked to fill in the blank. (Expanding on preview, seeing Ryvar's comment: The model for creating and updating that state is developed during the LLM training, and applied during play. The dynamic application during inference is also fairly remarkable.)

Knowing the board state is obviously incredibly helpful for predicting the next move in the game - you have to know where things are, but also what constitutes a legal move, which means knowing the current color of the pieces. Developing a model for this info is clearly a massive boost to the problem of predicting Othello moves, and the Othello LLM apparently does so just fine.

Likewise, I don't think that it's unreasonable to believe that entirely text-based LLMs can develop certain kinds of primitive models of the world, such as we can construct out of text alone. Consider the sentence "Batman punched the thug." Then we can infer that the thug has been hurt, and hurt people have a range of behaviors associated with them which might help predict what the hurt person will do next: they are more likely to cry or punch back than, say, write a libretto. It's pretty clear that LLMs are able to model dialogues (indeed, for chatbots that's their whole job) and thus keep some separate representation of each of the entities in the dialog. Something as simple as 'tags' for the state of different characters in the dialog does represent a kind of world-model, as those tags are helpful predictors of future actions.
posted by kaibutsu at 3:22 PM on December 21, 2023 [4 favorites]


I use ChatGPT at work and run the team that licenses Copilot and I definitely find these models helpful even though they aren’t trustworthy. One problem I see is that the usefulness is directly related to the intelligence or linguistic creativity of the user. That is, I’m pretty good at thinking of interesting things to ask ChatGPT, good follow-up, ways to refine the output, etc. but the people I know who are bad writers themselves struggle to get really good usable info out.

Also I had a frustrating interaction the other day. Nerds have started using “O11y” for “observability” so I asked ChatGPT “can you give me 20 words that are 13 letters long, start with O, and end with Y?” Three words it listed didn’t end in Y, so I told it and it apologized and tried again. 18/20. Then 16/20. Then 19/20, and of course the correct matching words were not the same every time. It was totally incapable of refining the existing list. I think that’s a good illustration of some of the points above.

Also Pope Guilty I may have been one of the people griping in other threads about the constant discussion over whether it’s real AGI. I think it depends on the context: in some threads the convo was more about social and economic impacts, and in those cases I’d say “Whether or not these LLMs are actually building process models or approaching sentience (I say no) will have zero impact on the massive degree to which they will be funded, marketed, relied on by regular people, and used to make critical life-or-death decisions.” Sad but true.
posted by caviar2d2 at 3:37 PM on December 21, 2023 [3 favorites]


Yes. Maybe someday we'll have to deal with computer programs that approximate human brains. But we have to deal right now with people using computers to do a lot of things computers couldn't do before. They can't actually drive like people or talk like people or draw like people, but they can definitely disrupt people trying to do those things.
posted by straight at 3:50 PM on December 21, 2023 [3 favorites]


Right. They contain a compressed, fixed representation of so much Othello or English that nearly any reasonable input is going to yield plausible Othello/English output.

Well I think the claim is that they don’t just encode a mapping of reasonable outputs for nearly every reasonable input, but some kind of state machine that implements the underlying rules to transforms inputs into outputs? At some point the next step in more efficiently compressing data is actually to model it.

I asked ChatGPT “can you give me 20 words that are 13 letters long, start with O, and end with Y?” Three words it listed didn’t end in Y

GPT uses a tokenization scheme in which multiple characters or even whole words can be represented as a single unit (and also generates its output in a single pass) which may partly explain why it’s so bad at this particular kind of thing.
posted by atoxyl at 3:56 PM on December 21, 2023


Does an encyclopedia volume know that Paris in the capital of France?

Yes, in exactly the same way that Wikipedia knows that the capital of France is xoBongRipz420xo.
posted by Mayor West at 4:52 PM on December 21, 2023 [2 favorites]


atoxyl is right- LLMs in general do almost all of their "thinking" in a semantic space divorced from syntax. ChatGPT in particular does have a fallback method for modeling the input character-by-character to help it not get too confused by misspelled words, but it likely discards that syntactic-space crap as soon as it can possibly get it into the same semantic space that the rest of the model uses. Most of its "knowledge" of spelling and pronunciation is probably, counterintuitively, from what people have said about spelling and pronunciation, so it's not going to be super great at playing with syntax.
posted by a faded photo of their beloved at 4:55 PM on December 21, 2023 [1 favorite]


Does the brain use algorithms? How do you define that term? Is the brain Turing equivalent? Can we invoke the Halting Problem? Are there undefined variables in my head? Are there infinite loops lurking somewhere?

First they used brain analogies to describe and label components of a computer. Then they began to use computer analogies to describe and label brain components and activities. A vicious circle of metaphor.

There has been a lot of use here of the term “model.” All models are imperfect, because, as been said many times in this context, the map is not the territory. All models are incomplete representations of something else. If they were complete they would be that something else.
posted by njohnson23 at 5:00 PM on December 21, 2023 [2 favorites]


The model for creating and updating that state is developed during the LLM training, and applied during play. The dynamic application during inference is also fairly remarkable

By the way this was a lovely read, kaibutsu. The application of Fischer random chess to Othello brought a huge smile to my face, and pairing that with probing made for an especially convincing argument.

I’ve said on (rather too) many occasions that trained neural networks contain the systemic tensions or logical underpinnings of the system they are modeling, but what they’ve done here really scrapes at the edges of how complete I thought that representation could get within LLMs and related techniques. It’s still purely internal to the system in question; there’s nothing here about how Othello in abstract situates within the broader context of the world, but within those bounds it’s unnervingly complete.
posted by Ryvar at 9:36 PM on December 21, 2023


I have not followed the links to the Othello-playing system, but I am not so very surprised by this. There is in fact a language whose utterances are the legal moves in Othello: it's a formal language and it's not a big surprise that it should be discoverable from enough examples. The system that discovers this language becomes an expert in the Othello game world.
Natural human languages are not formal systems. This kind of discovery of the world they describe is not going to be possible.
posted by Aardvark Cheeselog at 10:54 AM on December 22, 2023 [1 favorite]


Eh, go read it, challenge your assumptions instead of stewing in them...

And humans aren't necessarily /that/ complicated to model, especially at the level of a short dialogue. Consider the complexity of, say, the NPC system in a well constructed interactive fiction piece. There's a formal system controlling the characters, and adding just a bit of generative capacity could go a long way in making them more life-like.
posted by kaibutsu at 1:14 PM on December 22, 2023 [1 favorite]


> kaibutsu: "Knowing the board state is obviously incredibly helpful for predicting the next move in the game - you have to know where things are, but also what constitutes a legal move, which means knowing the current color of the pieces. Developing a model for this info is clearly a massive boost to the problem of predicting Othello moves, and the Othello LLM apparently does so just fine.

Likewise, I don't think that it's unreasonable to believe that entirely text-based LLMs can develop certain kinds of primitive models of the world, such as we can construct out of text alone. Consider the sentence "Batman punched the thug." Then we can infer that the thug has been hurt, and hurt people have a range of behaviors associated with them which might help predict what the hurt person will do next: they are more likely to cry or punch back than, say, write a libretto."


Imho, the fundamental question here is can semantics can be inferred from syntax, and if so, to what degree and in what sense. In the case of board games, like Go or chess, the board/game state is the syntax and the rules of the game provide the semantics (e.g.: has someone already won the game?). In the case of programming languages, the high-level code is the syntax and the compiler provides the semantics (or, in a looser but perhaps more meaningful sense, semantics can be inferred by executing the program and examining the results). In natural language, though, we don't have such clean-cut rules to go from syntax to semantics. Like, how do we "know" that "the thug has been hurt" after Batman punched him? Well, if we're thinking of an actual human person, one way that person could make such a deduction is that they would know what punching means in real-life and what being hurt means in real-life and be able to connect the dots from there. But an LLM who could make the same deduction -- or, if we want to be sticklers, produce text that communicates the meaning of the same deduction -- is not making this deduction in the same way. Since it only receives text as input (and whatever manual adjustments its human overseers implement), it can only make inferences based on text; it can't know the subjective experience of being hurt nor of being punched, but it has apparently been fed sufficient texts that it can produce sentences like "Yes, if Batman, a fictional character known for his strength and combat skills, punches a thug in a story or comic book, it's likely that the thug would be hurt." in response to the prompt "Batman punched a thug. Has the thug been hurt?". Now, if a person were to say that sentence, we would naturally assume that they understood who Batman was, what punching was, what being hurt was, etc... in a conventional sense. But this is an LLM, not a person. It can definitely produce the sentence but what should we infer about what's behind the production of the sentence? Do we credit the LLM with understanding Batman, punching, and hurt? Or is this just an elaborate parlor trick? And does it even matter?
posted by mhum at 4:25 PM on December 22, 2023


Syntax describes the structure of a language. How the pieces fit together and what sort of semantics they convey. For example: noun phrase = [article] [adjective] noun. Where the [] means optional. Articles are specifiers as in THE ball versus A ball. Adjectives are words that convey properties of a thing. A noun is a person, place, or thing. The semantics embodied in the syntax is very general, it only defines what kind of word it is, not what the word means. Syntax can be expressed via word order and/or prefixes or suffixes added to the words. Do LLMs actually parse the text and assign syntactical function to them? Does it then look up the word to determine the semantics of the individual word? And then does it form a semantic analysis of the combined words and the syntax and thus assign meaning to the sentence? Or does it just create a statistical map of the text based on what word follows or proceeds a given word ala our friends, the Markov Chains? For example, given the sentence “Colorless green ideas sleep furiously,” what does it do?
posted by njohnson23 at 5:00 PM on December 22, 2023


Oh, and then there are “Times flies like an arrow” and “Fruit flies like an apple.”
posted by njohnson23 at 5:25 PM on December 22, 2023


Are we allowed to post ChatGPT responses in threads? I can't remember if that got forbidden at some point. But njohnson23's final question there made me wonder, well...what would it do? Would it recognize at all that the words didn't go together? I tried a variation of that sentence, and it balked! "I have not heard of such a thing, and I doubt it exists. It sounds like a nonsensical phrase, perhaps generated by a random word generator or a creative writing exercise. There is no logical or factual basis for a democracy to be fully transparent, red, or guilty. These are adjectives that do not apply to a form of government, but rather to physical objects or human emotions." I challenged it--hadn't it ever heard of figurative language? "I still do not understand how the sentence you mentioned is an example of figurative language. It does not seem to use any of the common figures of speech, nor does it convey any clear or meaningful message. Could you please explain what you meant by it?"
posted by mittens at 5:26 PM on December 22, 2023 [1 favorite]


(I also asked it, do peaches scream? And it said, 'Peaches do not scream, as far as I know. They are fruits that grow on trees and do not have any vocal organs or nervous system to produce or perceive sound. They also do not have any reason to scream, since they are not alive in the same sense as animals or humans. However, I did find some references to peaches and screaming in different contexts. For example, there is a book titled Peaches and Scream, which is a cozy mystery novel about a murder in an apple orchard. There is also a phrase “peaches and cream”, which means something that is very pleasant or easy, and sometimes used sarcastically or ironically to imply the opposite. For instance, one might say “Life is not always peaches and cream” to express dissatisfaction or hardship. Perhaps someone modified this phrase to “peaches and scream” to create a humorous contrast or a pun.') (then it wrote me a bad poem about peaches.)
posted by mittens at 5:27 PM on December 22, 2023 [1 favorite]


Mittens,

The words do go together, it is a syntactically correct sentence but the semantics are either nonsense or possibly poetic, we humans get to decide that. It’s a famous example from good ol’ Noam Chomsky as are the other two I just posted.
posted by njohnson23 at 5:30 PM on December 22, 2023


My strategy doing a Turing test is to ask ludicrous questions, as you did, and see what the reaction would be… Machines do not have a sense of humor, surrealism, or plain intentional idiocy. ChatGPT seems to take everything literally.
posted by njohnson23 at 5:33 PM on December 22, 2023


I felt, for a second, that something had snapped into focus--usually the technical details of these discussions go wayyyy over my head, but the program's answers made sense. Not that it 'knows' about peaches or whatever, but that it can classify words by how likely they are to be associated with other words (or, y'know, not "words" but however it's breaking down units of meaning). And this made me wonder, is there any concept we humans carry around that is so limited-to-words? I started thinking of anglerfish--I've never actually seen one, everything I know about them comes from pictures and articles. I don't even think I've seen one in a video. But I have my little web of associations and comparisons that make them make sense. I can make sense of pretty much anything like that, by matching it to something I've seen, felt, heard. A constant sort of analogizing. Do we have any concepts that we work with the way an LLM does--purely made of language, with no reference to the world or analogies from the world?
posted by mittens at 5:49 PM on December 22, 2023 [1 favorite]


There is a British magazine called VIZ, which has a character who takes everything literally and applies logic to that and always ends up in trouble. Reading that output you posted reminded me of that character. I found it interesting that it noted a novel called Peaches and Screams, and then said it takes place in an APPLE orchard. Then it gives an ironic use of the phrase in a sentence that is not ironic. There is a whole genre of nonsense poetry. Given that, I would assume that it would again take it all literally. Up above I gave a link to the novel Finnegans Wake by James Joyce. It is written in a very complex and intentional way with multilingual portmanteau words, puns, and other forms of weird linguistic tricks. It’s also extremely funny, insightful, and an amazing read. I asked how that would be dealt with. No takers. We humans can be very flexible in how we hear and read texts. We can understand bad grammar pretty well. We also have a good sense of when someone is speaking ironically, or telling lies, or exaggerating, or other stretches of the language. My own thought is that we listen/read and our brain is constantly processing by asking what does it mean, how does it mean that, over and over, creating an interpretation that may or may not capture the speaker/writer’s intent to some varying degree. I don’t think we function like an LLM in any way. The world, analogies from the world, our experiences, our current bodily state, stuff we made up along the way, etc. all take part in our granting meaning around and in us.
posted by njohnson23 at 6:20 PM on December 22, 2023 [1 favorite]


I don’t think we function like an LLM in any way.

njohnson23, I am really not sure. I make the choice (what I've told myself is a choice) to think the embodied way we exist, the embodied nature of whatever we describe as my/your sentience and intelligence, is meaningful, and meaningfully different from whatever processes result in LLM output.

perhaps on my deathbed I will come to the realization as to how an active thought inhabits regions of neurons even as my index finger twitches, one element of physicality integral to this emergent thought. Epiphany. And then my sphincter gathers up a troublesome fart and releases it with joy, another epiphany, then death.
posted by elkevelvet at 9:27 AM on December 23, 2023 [1 favorite]


Elkevelvet - is there any difference between taking your last breath and turning off a computer?
posted by njohnson23 at 9:52 AM on December 23, 2023


Is there any difference between James Joyce and Viz?
posted by flabdablet at 3:12 AM on December 24, 2023


Other than the pictures, not really.
posted by njohnson23 at 8:53 AM on December 24, 2023 [1 favorite]


Top tip!
posted by flabdablet at 9:20 AM on December 24, 2023 [1 favorite]


LLM's are subject to hallucination, cstross used a quick method to get the LLM to start hallucinating, but what's really troublesome is that they can hallucinate right off the bat without any working through trueish things first and then asked to continue.

The first time I was fiddling around with openAI's free version of GPT3 it confidently informed me that Mark Hammill had reprised his role as Luke Skywalker in Phantom Menace, Attack of the Clones, and Revenge of the Sith and that was the first Star Wars question I'd asked it.

I'm not sure what the future is, but I do know we can't ban AI. Back in the 1980's when it would have taken a billion dollar supercomputer months to do what DALL-E does in a few seconds, sure. The barrier to entry was too high. But today anyone with a semi-decent graphics card can spin up an instance.

It looks like the biggest limiting factor for the current family of generative models is non-reproducibility. Since it's all done iteratively and with a bit of randomness that means it will never generate exactly the same output from the same input. You can ask DALL-E to draw you a sketch of a rabbit with a machine gun, but if you like what it produced and you want to see that same rabbit drinking coffee it can't do it. It can only create an entirely new thing based on the same input, the rabbit drinking coffee will look like a different rabbit from the one with the machine gun. And that does limit the commercial aspects a bit. Probably not enough though.

Pope Guilty I agree with you, but that ship has sailed. Corporate America has a really unfortunate ability to change the definition of any technological term based on how cool they want their new product to sound.

Look at how we have to refer to REAL 4g as "4g LTE" because the phone companies kept calling the expanded 3g they were using "4g". And the same has already happened with 5g. The real 5g isn't out yet, but we've got every phone company out there calling their current thing 5g and it stuck.

We lost on AI years ago when people started slapping that label on simple evolutionary algorithms, and people started talking about AGI because they had to make up a new term for real AI as opposed to corporate America bullshit "AI". Except now OpenAI and the others have already started a buzz about how their latest GPT might be AGI and of course it isn't and we'll have to make up yet another term for the real thing because fucking PR departments have virtually infinite power.

So yeah, you're right. But we're not going to win this one.
posted by sotonohito at 1:32 PM on December 24, 2023


Any statement about AI that is only based on what is currently available will be out of date in a matter of months. Optimizations that reduce processing power 100-fold, or entirely new architectures that allow new functionality, are appearing all the time.
It’s amazing that current LLMs have been able to so convincingly seem to model the world despite only ever “experiencing” it through out-of-context text. It does seem very similar to the thought experiment of a person raised in a black and white environment who learns all about color but has never experienced it directly. The interesting question becomes, when true multi-modal AI systems (which can take in and output data in multiple formats, such as images, audio and video in addition to text) are created, will they bridge the remaining gap and show clearly superhuman performance? I think so.
posted by bakerybob at 10:52 AM on December 26, 2023


« Older Asteroid bits, fast spaceships, JuMBOs, a space...   |   Icahn seen clearly now the gains have gone Newer »


This thread has been archived and is closed to new comments