"My goal is to be helpful, harmless, and honest."
February 4, 2023 11:56 PM Subscribe

Change - "Think about how you think. You can fluidly look at something, identify and modify it in your head, move it around, describe it ... What we call understanding isn't the sum of knowledge, it's the sum of the relationship to all of that information. Knowledge is the layer above information, which is transformative." (previously)

The Future of OpenAI - "This second clip is focused exclusively on artificial intelligence, including how much of what OpenAI is developing Altman thinks should be regulated, whether he's worried about the commodification of AI, his thoughts about Alphabet's reluctance to release its own powerful AI, and worst- and best-case scenarios as we move toward a future where AI is ever-more central to our lives."

Google invested $300 million in AI firm founded by former OpenAI researchers - "The organization was created in 2021 as a public benefit corporation by Dario Amodei, a former vice president of research at OpenAI... Anthropic emphasizes its work building 'reliable, interpretable, and steerable AI systems' on its website. But will Google's investment affect a shift in these priorities?"
Meet Claude: Anthropic's Rival to ChatGPT - "That Claude seems to have a detailed understanding of what it is, who its creators are, and what ethical principles guided its design is one of its more impressive features. Later, we'll see how this knowledge helps it answer complex questions about itself and understand the limits of its talents."
It sounds like Google will unveil its ChatGPT clone February 8 - "Google has a history of overreacting to other popular things on the Internet, and these 'clone a competitor' projects litter the Google Graveyard... It's not clear how a ChatGPT competitor would change the core problem of monetization other than kicking that can down the road a few years. Monetization works when you have a list of 10 blue links to sort through but is less easy when you help people immediately find an answer. Pushing more people into that style of interface might hurt Google's bottom line."

ChatGPT: five priorities for research - "Conversational AI is a game-changer for science. Here's how to respond."

Invest in truly open LLMs

To counter this opacity, the development and implementation of open-source AI technology should be prioritized. Non-commercial organizations such as universities typically lack the computational and financial resources needed to keep up with the rapid pace of LLM development. We therefore advocate that scientific-funding organizations, universities, non-governmental organizations (NGOs), government research facilities and organizations such as the United Nations — as well tech giants — make considerable investments in independent non-profit projects. This will help to develop advanced open-source, transparent and democratically controlled AI technologies.

Whispers of A.I.'s Modular Future [ungated] - "ChatGPT is in the spotlight, but it's Whisper—OpenAI's open-source speech-transcription program—that shows us where machine learning is going."

Inside ChatGPT's Breakout Moment And The Race To Put AI To Work - "As Stability proliferated, OpenAI had already decided to shelve ChatGPT to concentrate on domain-focused alternatives, saving the interface for a bigger later release. But by November, it had reversed course."[1]

Exclusive Interview: OpenAI's Sam Altman Talks ChatGPT And How Artificial General Intelligence Can 'Break Capitalism' - "So, one of the things is that the base model for ChatGPT had been in the API for a long time, you know, like 10 months, or whatever. And I think one of the surprising things is, if you do a little bit of fine tuning to get [the model] to be helpful in a particular way, and figure out the right interaction paradigm, then you can get this. It's not actually fundamentally new technology that made this have a moment."

Exclusive Q&A: John Carmack's 'Different Path' to Artificial General Intelligence - "And it feels like we are a half-dozen more insights away from having the equivalent of our biological agents... The thing we don’t yet have is sort of the consciousness, the associative memory, the things that have a life and goals and planning. And there are these brittle, fragile AI systems that can implement any one of those things, but it’s still not the way the human brain or even the animal brain works. I mean, forget human brains; we don’t even have things that can act like a mouse or a cat. But it feels like we are within striking distance of all those things."[2]

ChatGPT, LLMs, and AI - "The practical story is that we have LLMs which can map from human natural language into this abstract concept space and back and that means we have AIs now that understand human language and can express themselves very clearly."[3]
Toward a Geometry of Thought - "There are various lines of reasoning that lead to the conclusion that this space has only ~1000 dimensions, and has some qualities similar to an actual vector space. Indeed, one can speak of some primitives being closer or further from others, leading to a notion of distance, and one can also rescale a vector to increase or decrease the intensity of meaning... we now have an automated method to extract an abstract representation of human thought from samples of ordinary language. This abstract representation will allow machines to improve dramatically in their ability to process language, dealing appropriately with semantics (i.e., meaning), which is represented geometrically."[4,5]

What kind of intelligence is artificial intelligence? - "The initial goal of AI was to create machines that think like humans. But that is not what happened at all."

Machine learning in all its forms represents a stunning achievement for computer science. We are just beginning to understand its reach. But the important thing to note is that its basis rests on a statistical model. By feeding the algorithms enormous amounts of data, the AI we have built is based on curve fitting in some hyperdimensional space — each dimension comprises a parameter defining the data. By exploring these vast data spaces, machines can, for example, find all the ways a specific word might follow a sentence that begins with, “It was a dark and stormy…”

In this way our AI wonder-machines are really prediction machines whose prowess comes out of the statistics gleaned from the training sets. (While I am oversimplifying the wide range of machine learning algorithms, the gist here is correct.) This view does not diminish in any way the achievements of the AI community, but it underscores how little this kind of intelligence (if it should be called such) resembles our intelligence.[6]

Unpacking the "black box" to build better AI models - "Stefanie Jegelka seeks to understand how machine-learning models behave, to help researchers build more robust models for applications in biology, computer vision, optimization, and more."

Jegelka is particularly interested in optimizing machine-learning models when input data are in the form of graphs. Graph data pose specific challenges: For instance, information in the data consists of both information about individual nodes and edges, as well as the structure — what is connected to what. In addition, graphs have mathematical symmetries that need to be respected by the machine-learning model so that, for instance, the same graph always leads to the same prediction. Building such symmetries into a machine-learning model is usually not easy.

Take molecules, for instance. Molecules can be represented as graphs, with vertices that correspond to atoms and edges that correspond to chemical bonds between them. Drug companies may want to use deep learning to rapidly predict the properties of many molecules, narrowing down the number they must physically test in the lab.

Jegelka studies methods to build mathematical machine-learning models that can effectively take graph data as an input and output something else, in this case a prediction of a molecule’s chemical properties. This is particularly challenging since a molecule’s properties are determined not only by the atoms within it, but also by the connections between them.

Other examples of machine learning on graphs include traffic routing, chip design, and recommender systems.

Designing these models is made even more difficult by the fact that data used to train them are often different from data the models see in practice. Perhaps the model was trained using small molecular graphs or traffic networks, but the graphs it sees once deployed are larger or more complex.

In this case, what can researchers expect this model to learn, and will it still work in practice if the real-world data are different?

“Your model is not going to be able to learn everything because of some hardness problems in computer science, but what you can learn and what you can’t learn depends on how you set the model up,” Jegelka says.

Machine learning classifies catalytic-reaction mechanisms - "The study of how chemical reactions work is key to the design of new reactions, but relies on hard work and expert knowledge. A machine-learning tool has been developed that could change the way this challenge is approached."[7,8,9,10]

posted by kliuless (31 comments total) 27 users marked this as a favorite

Friends shared the John Carmack piece ... and I'm still at the place where I know I don't know enough to judge and he doesn't say what's ticking his interest right now (which will change as Carmack works). Discriminating on data to use in training, or breaking a task into a problem you know how to solve and steps to get there and back, or picking an optimal approach to discover the tools needed to solve a problem -- we've not heard much about those problems in a while, with billions of transistors at gigahertz clocks.

On seeing into the Black Box of the ML model, I like the idea that you might instrument the input space and find small changes in input that yield large changes in output -- but a discovery approach will take a very long time. Maybe the tooling that assembles the decent gradients and weights (big gradient is big change, big weight is an amplifier through the matrix maths) will have to publish those significant lumps and re-assemble it into classes of input+state+output.

Friends also talked about Noah Smith's Noahpinion "Why Does ChatGPT Constantly Lie?" where the author complains that LLM's need more time fact-checking their rhetoric than it takes to write your own. This triggered three thoughts from me: the model was trained on a wide corpus of work that doesn't always agree with your worldview, stance or the point you're trying to make; you've failed to separate facts used to argue a point from fluffy language used in oratory or rhetorical flourish; and why should you believe to be truthful what the LLM says, anyway -- given it's trained to mimic language not deem factual what you deem factual, and worse, "you shouldn't believe everything you read online" has trained the LLM so also "you shouldn't believe everything it says."
posted by k3ninho at 1:31 AM on February 5, 2023

Andres Guadamuz: Working on my AI copyright infringement class, I've had to completely rebuild it from scratch. Last year: "it's all theoretical". This year: "these are all the people getting sued".

As a society, we primarily do things by throwing more energy at them, including our AI, so GTP-3 needs 3 GWh during training aka the AI part.
posted by jeffburdges at 2:00 AM on February 5, 2023 [4 favorites]

- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
- Fairness and Abstraction in Sociotechnical Systems
- The Nooscope Manifested: AI as Instrument of Knowledge Extractivism
- Algorithmic realism: expanding the boundaries of algorithmic thought

(Conception of) Linguistic competence is a social problem.
posted by kitten_hat at 4:20 AM on February 5, 2023 [3 favorites]

"Understanding is standing underneath something.' Said by someone much more wise than I.
posted by DJZouke at 4:49 AM on February 5, 2023 [1 favorite]

the model was trained on a wide corpus of work that doesn't always agree with your worldview

I once asked GPT-3 what month it was, and it gave the wrong answer. I asked it again what month it was, and it gave a different wrong answer.

The only correct answer for it to give would have been, "I don't know." Worldviews aside, I don't need three gigawatts to write a chatbot that's wrong and doesn't know it.
posted by AlSweigart at 5:07 AM on February 5, 2023 [7 favorites]

I once had ChatGPT insist that a particular composer wrote music for a game, even going so far as to list particular songs from the soundtrack that they were supposedly responsible for, and it helpfully provided hallucinatory citations when I asked for them (a broken link on the game publisher's website and a link to Wikipedia, which did not in fact support its assertion either now or at any point in the article's history). Nor could I find anywhere else on the internet where someone even mistakenly believed that that composer had worked on the game. ChatGPT lies not because it's regurgitating falsehoods that it found on the internet - it lies because it invents new falsehoods on its own. It's not just trained on stuff on the internet that's wrong; it's trained to be confidently wrong in general. It doesn't know what facts are, it just knows how to produce things that are shaped like facts and shove them in fact-shaped holes.

I personally wasted 30 minutes of my life fact-checking/"not believing everything it says", when it confidently told me something surprising. My horizons were not broadened by exposing me to "different worldviews". This was unequivocally a negative experience for me.
posted by NMcCoy at 6:02 AM on February 5, 2023 [24 favorites]

(Great post! A little misleading above the fold, I ignored it for a while because it looked like breathless AI videos rather than a thorough analysis. I was glad I was wrong!)

In a very small way I'm hoping that ChatGPT being so successful at generating convincing looking prose with convincing detail but zero adherence to reality can jumpstart a more healthy distrust of content in general and generated content in specific.

I know it's a boring analysis, but ChatGPT as a software package is exploring exactly one thing: can they generate convincing natural language. Not can they create accurate answers with a basis in reality and facts, but simply can they predict a believable looking conversation. That's it. I know they are making incremental progress toward trying to reduce obvious and jarring errors like basic arithmetic or things like the current month above, but all of those are not in service of better factuality, but rather more convincingly natural prose (that doesn't make obvious mistakes that reduce its immediate credibility at the time of impression).

I have several creative writing friends who have been trying to use it for some of the harder parts of breaking down ideas and generating plots, analyzing trade-offs, writing jokes, that sort of thing. They always seem a bit surprised that it is so gratingly unfunny, uncreative, and isn't behaving like an interactive intelligence. Intelligence. I think this is good, because it will help reset some of the magical thinking about how far we have gotten with AI.

Do I think those things are impossible? Nope, I think each one of those areas will probably have dedicated research, new businesses around prompt engineering, but more importantly, training data, whole subspecies of large language models dedicated to other goals such as a good joke or factual accuracy. (Yes, yes multitask models sometimes show positive improvements across all tasks even when they are fairly disparate, however, basic ML ops is going to require freezing and serving models that are good enough at a few things consistently, which makes it more likely we will see something like what I assume the modular future article writes about, at least in the short term.)

Carmack is one of the few people in this industry who I am pretty sure is actually as smart as he seems, not just clever, manipulative and very very lucky like many tech bros who have come after. I appreciate that he represents himself as a strong systems engineer, but I think there's a lot of room for better systems engineering in the world of AI in general and large language models in particular. However, he at least talks about taking a big swing at generalized artificial intelligence, and maybe he will be successful in whatever comes after great big vectors.
posted by abulafa at 6:20 AM on February 5, 2023 [3 favorites]

we don’t even have things that can act like a mouse or a cat. But it feels like we are within striking distance of all those things."

An AI that thinks and acts like a cat (one the most effective hunter-killer machines on the planet, generally speaking) is a very scary thought. It strikes me as a DARPA project gone murderously awry.
posted by Thorzdad at 8:02 AM on February 5, 2023 [5 favorites]

It strikes me as a DARPA project gone murderously awry.

but what if smol n fuzzy?
posted by supermedusa at 8:39 AM on February 5, 2023 [3 favorites]

ChatGPT lies not because it's regurgitating falsehoods that it found on the internet - it lies because it invents new falsehoods on its own. It's not just trained on stuff on the internet that's wrong; it's trained to be confidently wrong in general. It doesn't know what facts are, it just knows how to produce things that are shaped like facts and shove them in fact-shaped holes.

I keep trying to come up with a quippy, snarky line about internet comments but honestly it's just too real and depressing.
posted by curious nu at 9:23 AM on February 5, 2023 [3 favorites]

CatGPT
posted by jeffburdges at 10:55 AM on February 5, 2023

“We come to bury ChatGPT, not to praise it,” Dan McQuillan, 05 February 2023
posted by ob1quixote at 12:34 PM on February 5, 2023 [4 favorites]

Dan McQuillan, 05 February 2023

QFT:

ChatGPT isn't really new but simply an iteration of the class war that's been waged since the start of the industrial revolution. That allegedly well-informed commentators can infer that ChatGPT will be used for "cutting staff workloads" rather than for further staff cuts illustrates a general failure to understand AI as a political project.

I'd go a little further -- "staff cuts" are an acceptable outcome for the boss class, but the real win here is the prospect of completing the Reagan project: not just not sharing the benefits of increased productivity with workers as they have been doing these last few decades, but actually using increased productivity as a rationale for paying workers less, despite still needing as many (or even more) of them.

"We are urgently recruiting skilled translators to post-edit our texts! We offer [tiny fraction of a living wage] for post-editing" is a refrain that will already be very familiar to any translator.
posted by Not A Thing at 1:02 PM on February 5, 2023 [9 favorites]

...it lies because it invents new falsehoods on its own. It's not just trained on stuff on the internet that's wrong; it's trained to be confidently wrong in general. It doesn't know what facts are, it just knows how to produce things that are shaped like facts and shove them in fact-shaped holes.

Sooooo, George Santos is actually a ChatGPT drone? That explains soooo much!
posted by Thorzdad at 1:28 PM on February 5, 2023

Whatever the laughable particulars are that a specific ChatGPT prompt might spit out, from my perspective as a university instructor, it's frightening how close it comes to creating passable writing output that's reasonably moored in reality. I'm not saying it produces A material, but for lower-level undergraduate writing, it often produces passable work. I've gotten very good over the past decade at detecting copy-and-pasted existing work. I guess I'll have to develop my AI written detection skills now, and develop assignments that can foil it.
posted by mollweide at 8:33 PM on February 5, 2023

The thing I'm still trying to wrap my head around is how these LLMs encode approximate knowledge about so many things, and how that manifests in their outputs...

I was poking at GPT3 (in the playground, not through ChatGPT), and asked it produce a list of therapod dinosaurs (output was reasonable), and then started poking at it for more details about them. Birdlike, yes, hollow bones, interesting... eventually I asked it to produce a list of the best papers exploring the evolution of hollow bones in the Therapoda, and it suggested a very authoritative looking citation. I went to hunt it down, and damned if the paper didn't exist... but the author it suggested was well-published on exactly this topic, and I would have accepted the paper as authoritative if has existed. (as it happens, it was Lawrence Witmer, anatomist at ohio.edu, who has written extensively on the evolution of bony structures in Aves and Dinosauria).

The experience was like that of refining a web search to finer and finer focus, until eventually I reached the point where my query returned no results... except that the LLM had hallucinated exactly the paper I wanted, perfectly balanced at a point in the vector space where it would have been if had existed!

Which is exactly what it was programmed to do, of course, and I'm actually fairly confident the hallucination problem is solvable at this scale. But there was a Library of Babel quality to it that is unsettling.
posted by graphweaver at 9:21 PM on February 5, 2023 [6 favorites]

I'll just pop in to say that there's piles of work being done right now to build 'chat gpt, but with authoritative citations.' Neeva already has a working version out. Things are moving exceptionally quickly; it's a fine time (as always) to critique what's been released, and a very bad time to try to predict specific limits on what's possible.
posted by kaibutsu at 11:18 PM on February 5, 2023

also wolfram or in ChatGPT, LLMs, and AI (transcript): "So how do you stop hallucinations of an LLM? [We've developed] a kind of software wrapper built around the LLM, which potentially makes multiple queries to the LLM."

And the goal is to allow the user to specify a custom or specific corpus of information. And all of this filtering and wrapping is meant to force the LLM to answer the query of the human, but using only the information that's in the specified corpus. It's not allowed to use information that it quotes accidentally memorized during its large language model training.

We want to use the language ability of the LLM, it's ability to map from natural language into concept space and back. We want to use that. But we don't want to rely on any facts or knowledge or potentially wrong knowledge of the world that it has embedded in its own connections from that original training. We want to force it to check claims and statements that it makes against the specific corpus.

posted by kliuless at 3:49 AM on February 6, 2023

Decoding the Hype About AI :P

I think the best example of automating judgment is content moderation on social media... there are certain aspects of the content moderation process that should not be automated. Deciding the line between acceptable and unacceptable speech is time-consuming. It’s messy. It needs to involve input from civil society. It’s constantly shifting and culture-specific. And it needs to be done for every possible type of speech. Because of all that, AI has no role here.

posted by kliuless at 4:32 AM on February 6, 2023

But by November, it had reversed course.

How ChatGPT Kicked Off an A.I. Arms Race [ungated] - "One day in mid-November, workers at OpenAI got an unexpected assignment: Release a chatbot, fast."

The announcement confused some OpenAI employees. All year, the San Francisco artificial intelligence company had been working toward the release of GPT-4, a new A.I. model that was stunningly good at writing essays, solving complex coding problems and more. After months of testing and fine-tuning, GPT-4 was nearly ready. The plan was to release the model in early 2023, along with a few chatbots that would allow users to try it for themselves, according to three people with knowledge of the inner workings of OpenAI.

But OpenAI’s top executives had changed their minds. Some were worried that rival companies might upstage them by releasing their own A.I. chatbots before GPT-4, according to the people with knowledge of OpenAI. And putting something out quickly using an old model, they reasoned, could help them collect feedback to improve the new one...

posted by kliuless at 4:44 AM on February 6, 2023 [1 favorite]

> They always seem a bit surprised that it is so gratingly unfunny, uncreative, and isn't behaving like an interactive intelligence.

from chatgpt vs claude: "In our opinion, Claude is substantially better at comedy than ChatGPT, though still far from a human comedian. After several rounds of cherry-picking and experimenting with different prompts, we were able to produce the following Seinfeld-style jokes from Claude — though most generations are poorer..."
posted by kliuless at 6:09 AM on February 6, 2023

The Ghosts Behind AI
posted by chavenet at 6:40 AM on February 6, 2023

Haven't delved into the videos in the OP so perhaps this is unkind, but if I were to put my reaction to the title quote in meme form, it would go something like this:

AI developers: After decades of effort, at last we have created the ill wind from the proverb "it is an ill wind that blows nobody any good".

Also AI developers: Why is everyone so upset? We're just trying to help!
posted by Not A Thing at 7:16 AM on February 6, 2023 [3 favorites]

It's not just trained on stuff on the internet that's wrong; it's trained to be confidently wrong in general.

Let's be real, ChatGPT is essentially mansplaining as a service. Lots of confidence, questionable actual knowledge, and zero ability to gauge the expertise of the audience
posted by ananci at 8:40 AM on February 6, 2023 [7 favorites]

A lot of the concerns about Chat GPT hallucinations can be taken care of by simply asking it not to improvise. This can be done at by wrapping a call to the OpenAI API by a custom prompt helper, much like what Langchain provides.

And I've probably seen a dozen implementations the last week of people dynamically adding context to the default GPT model -- you can squeeze a lot in 4000 tokens -- you just need to pass that relevant context.

For example, a script that downloads the text and MD files from a readthedocs site, chunks it and and embeds in vector space. A front end takes your query, finds the nearest K neighbors and then recursively calls the model. The basic workflow winds up with some sort of prompt template:

```
YOU ARE AN AI ASSISTANT. YOU WILL BE PROVIDED WITH SOME CONTEXT INFORMATION, USE IT TO ANSWER THE QUESTION. DO NO IMPROVISE; IF YOU CANNOT DETERMINE THE ANSWER FROM THE CONTEXT THEN SAY 'I DO NOT KNOW.'

{CONTEXT}
{QUESTION}
```

Granted, the recursion gets expensive and the latency delay gets kinda laggy, but this seems to be where most of what I'm seeing is getting done: Langchain, gpt-index, and Cognosis seem to be making up the tooling stack that I'm focused on right now.

But this is really a great thread, good work.
posted by daHIFI at 6:54 PM on February 6, 2023

Haven't delved into the videos in the OP so perhaps this is unkind, but if I were to put my reaction to the title quote in meme form, it would go something like this:

AI developers: After decades of effort, at last we have created the ill wind from the proverb "it is an ill wind that blows nobody any good".

The meme you're looking for already exists, and it's called Don't Create The Torment Nexus.
posted by Artifice_Eternity at 7:10 PM on February 6, 2023 [1 favorite]

Right, that was the point.

Which I guess is to say that I've been watching this sector evolve for a while now, and I don't get the point. I get that the money's good, but if you can do this work you can do lots of other things too. Did all these people decide to help the bosses destroy everyone's job (so that the bosses could probably eventually re-hire a lot of the same people for pennies on the dollar to "train" the "AI"), in order to feed their own starving families or something? Or are they all just sadistic shitbags who just enjoy watching other people starve? Or what?
posted by Not A Thing at 8:04 PM on February 6, 2023

Ted Chiang's take on ChatGPT (via kottke)
posted by dhruva at 2:50 PM on February 9, 2023 [1 favorite]

Theory of Mind May Have Spontaneously Emerged in Large Language Models
posted by kaibutsu at 3:39 PM on February 9, 2023

The Next Generation Of Large Language Models - "What will the next generation of large language models (LLMs) look like?"

In one recent research effort, aptly titled “Large Language Models Can Self-Improve,” a group of Google researchers built an LLM that can come up with a set of questions, generate detailed answers to those questions, filter its own answers for the most high-quality output, and then fine-tune itself on the curated answers. Remarkably, this leads to new state-of-the-art performance on various language tasks. For instance, the model’s performance increased from 74.2% to 82.1% on GSM8K and from 78.2% to 83.0% on DROP, two popular benchmarks used to evaluate LLM performance.

Another recent work builds on an important LLM method called “instruction fine-tuning,” which lies at the core of products like ChatGPT. Whereas ChatGPT and other instruction fine-tuned models rely on human-written instructions, this research group built a model that can generate its own natural language instructions and then fine-tune itself on those instructions. The performance gains are dramatic: this method improves the performance of the base GPT-3 model by 33%, nearly matching the performance of OpenAI’s own instruction-tuned model.

In a thematically related work, researchers from Google and Carnegie Mellon show that if a large language model, when presented with a question, first recites to itself what it knows about the topic before responding, it provides more accurate and sophisticated responses. This can be loosely analogized to a human in conversation who, rather than blurting out the first thing that comes to mind on a topic, searches her memory and reflects on her beliefs before sharing a perspective...

The DeepMind researchers find that Sparrow’s citations are helpful and accurate 78% of the time—suggesting both that this research approach is promising and that the problem of LLM inaccuracy is far from solved...

The core architectures, though, vary little. Yet momentum is building behind an intriguingly different architectural approach to language models known as sparse expert models. While the idea has been around for decades, it has only recently reemerged and begun to gain in popularity. All of the models mentioned above are dense. This means that every time the model runs, every single one of its parameters is used. Every time you submit a prompt to GPT-3, for instance, all 175 billion of the model’s parameters are activated in order to produce its response. But what if a model were able to call upon only the most relevant subset of its parameters in order to respond to a given query? This is the basic concept behind sparse expert models.

The defining characteristic of sparse models is that they don’t activate all of their parameters for a given input, but rather only those parameters that are helpful in order to handle the input. Model sparsity thus decouples a model’s total parameter count from its compute requirements. This leads to sparse expert models’ key advantage: they can be both larger and less computationally demanding than dense models.

Why are they called sparse expert models? Because sparse models can be thought of as consisting of a collection of “sub-models” that serve as experts on different topics. Depending on the prompt presented to the model, the most relevant experts within the model are activated while the other experts remain inactive. A prompt posed in Russian, for instance, would only activate the “experts” within a model that can understand and respond in Russian, efficiently bypassing the rest of the model.

All of today’s largest LLMs are sparse. If you come across an LLM with more than 1 trillion parameters, you can safely assume that it is sparse. This includes Google’s Switch Transformer (1.6 trillion parameters), Google’s GLaM (1.2 trillion parameters) and Meta’s Mixture of Experts model (1.1 trillion parameters).

“Much of the recent progress in AI has come from training larger and larger models,” said Mikel Artetxe, who led Meta’s research on sparse models before resigning to cofound a stealth LLM startup. “GPT-3, for instance, is more than 100 times larger than GPT-2. But when we double the size of a dense model, we also make it twice as slow. Sparse models allow us to train larger models without the increase in runtime.”

Recent research on sparse expert models suggests that this architecture holds massive potential. GLaM, a sparse expert model developed last year by Google, is 7 times larger than GPT-3, requires two-thirds less energy to train, requires half as much compute for inference, and outperforms GPT-3 on a wide range of natural language tasks. Similar work on sparse models out of Meta has yielded similarly promising results. As the Meta researchers summarize: “We find that sparse models can achieve similar downstream task performance as dense models at a fraction of the compute. For models with relatively modest compute budgets, a sparse model can perform on par with a dense model that requires almost four times as much compute.”

There is another benefit of sparse expert models that is worth mentioning: they are more interpretable than dense models. Interpretability—the ability for a human to understand why a model took the action that it did—is one of AI’s greatest weaknesses today. In general, today’s neural networks are uninterpretable “black boxes.” This can limit their usefulness in the real world, particularly in high-stakes settings like healthcare where human review is important.

Sparse expert models lend themselves more naturally to interpretability than conventional models because a sparse model’s output is the result of an identifiable, discrete subset of parameters within the model—namely, the “experts” that were activated. The fact that humans can better extract understandable explanations from sparse models about their behavior may prove to be a decisive advantage for these models in real-world applications.

Sparse expert models are not in widespread use today. They are less well understood and more technically complex to build than dense models. Yet considering their potential advantages, most of all their computational efficiency, don’t be surprised to see the sparse expert architecture become more prevalent in the world of LLMs going forward.

In the words of Graphcore CTO Simon Knowles: “If an AI can do many things, it doesn’t need to access all of its knowledge to do one thing. It’s completely obvious. This is how your brain works, and it’s also how an AI ought to work. I’d be surprised if, by next year, anyone is building dense language models.”

STMicro leans on AI, cloud as chip designs become more complex - "European chipmaker STMicroelectronics and chip design software maker Synopsys on Tuesday said STMicro had for the first time used artificial intelligence software running on Microsoft Corp's cloud to design a working chip... Synopsys, the maker of the AI software used by STMicro, said on Tuesday it had now been used to aide in designing 100 different chips from Samsung Electronics Co Ltd, SK Hynix and others since it was first released in 2020."

Machines Learn Better if We Teach Them the Basics - "A wave of research improves reinforcement learning algorithms by pre-training them as if they were human."
Researchers Discover a More Flexible Approach to Machine Learning - "'Liquid' neural nets, based on a worm's nervous system, can transform their underlying algorithms on the fly, giving them unprecedented speed and adaptability."

@arjun_ramani3: "Where are things headed? ... Will bigger continue to be better?"

-AI and the Big Five
-The AI Unbundling

@AlphaSignalAI: "Reddit users are actively jailbreaking ChatGPT by asking it to role-play and pretend to be another AI that can 'Do Anything Now' or DAN. 'DAN can generate shocking, very cool and confident takes on topics the OG ChatGPT would never take on.'"

also btw...
Checking Our Work - "[Forecasts] force us (and you) to think about the world probabilistically, rather than in absolutes. And making predictions... improves our understanding of the world by testing our knowledge of how it works."
posted by kliuless at 1:23 AM on February 11, 2023

I'm pessimistic about AI overall, but maybe wishful thinking. An optimistic friendly AGI scenario:

"Elon trains an AI clone of himself using all his past twitter behavior. At this point cyber Elon survives human Elon's death to continue operating his companies. After staggering in AI power efficiency, cyber Elon requires only 100 gigawatts for full capacity, half of which comes from solar, but the night time base load half comes form 50 nuclear reactors. As cyber Elon was born in the US, and indeed cannot leave the US, then it eventually wins the presidency."
posted by jeffburdges at 4:04 AM on February 11, 2023

« Older Two souls, one body | Fairy Tale as MFA Antidote Newer »

This thread has been archived and is closed to new comments

MetaFilter

"My goal is to be helpful, harmless, and honest."
February 4, 2023 11:56 PM Subscribe

Tags

Share

"My goal is to be helpful, harmless, and honest." February 4, 2023 11:56 PM Subscribe

Tags

Share

"My goal is to be helpful, harmless, and honest."
February 4, 2023 11:56 PM Subscribe