# Your Swimsuit Jumped Over Its Own Weathercock, You Liar!

June 2, 2013 7:30 PM Subscribe

Your Swimsuit Jumped Over Its Own Weathercock, You Liar! is "A (questionably) ero visual novel whose text is entirely driven by Markov chains, with the exception of a few strategically-placed ellipses" by Amy Roberts

direct Windows/Mac/Linux download (via RockPaperShotgun/FreeIndieGam.es) NSFW for pixelated nudity.

direct Windows/Mac/Linux download (via RockPaperShotgun/FreeIndieGam.es) NSFW for pixelated nudity.

Markov's Gun: if you introduce a firearm in Act I, you can guarantee that by Act III, it will be eaten.

posted by oneswellfoop at 8:33 PM on June 2, 2013 [14 favorites]

posted by oneswellfoop at 8:33 PM on June 2, 2013 [14 favorites]

curuinor: oooh, like what? (Not being snarky, I'm honestly really curious!)

posted by sixswitch at 9:01 PM on June 2, 2013

posted by sixswitch at 9:01 PM on June 2, 2013

Can I just say again how awesome Porpentine is at games journalism? I love every RPS "Live Free, Play Hard" feature. That lady can write.

posted by blahblahblah at 10:06 PM on June 2, 2013

posted by blahblahblah at 10:06 PM on June 2, 2013

The following is probably not too clear. If you want more details, memail or reply.

A finite state machine is a mathematical construct that abstracts the concept of state and transition. State, meaning the current state of an abstract machine, and transitions, which switch you between states. Think of a light bulb: it has two states, on and off, and four transitions: on to off, off to on, keeping it on if it's on, and keeping it off if it's off. You keep track of the current state, and you can say things about the model like: "start with on, and turn it off and turn it back on again". These are discrete timesteps that you go through, with a specific transition and a new state (which may actually be the same state as last timestep, it's ok).

A Markov chain can be thought of as a probabilistic finite state machine. Meaning, there are states, just like in the finite state machine, but the transitions are now defined in terms of probability measures. From Wikipedia, the example of the hypothetical stock market. Imagine that the stock market was a simple machine, like a light-switch with three settings, instead of two: a bull market, a bear market, and a stagnant market.

But you're not

So a bull market has a 7.5% chance to get to a bear market, a 2.5% chance to go to a stagnant market, and a 90% chance to stay a bull market after each time step (a probability distribution which says 90% bull, 2.5% stagnant, 7.5% bear, in other words). Or a bear market has a 15% chance to turn into a bull market, a 5% chance to turn into a stagnant market, and a 80% chance to stay a bear market at each time step. And the stagnant market has a 25% chance to turn into a bull market, a 25% chance to turn into a bear market, or a 50% chance to stay stagnant, at each time step.

You can also

(why do they call it the Markov chain? Andrey Markov thought up of a specific property called the Markov property which could apply to stochastic probability distributions like the one I described about the stock market. It means that the distribution has no memory: if you are in the bull market state, it doesn't matter if you came to the bull market state from the bear market state or the stagnant market state, the probabilities to transition from the bull market state are the same. A semimarkov distribution has a little bit of memory: for a semimarkov distribution of order 1, it does matter that you came to the bull market state from the bear market state, but it doesn't matter that you came to the bull market state from the bear market state from the stagnant market state. For a semimarkov distribution of order 2, it does matter, and so on.)

Now, the stock market is a thing that has its current state, in reality, determined by an immensely complicated probability distribution. What else is a thing that has its current state determined by an immensely complicated probability distribution? We could say, human language.

How does that work? We could look at it in a couple of ways, but a particularly productive way to look at it is via the n-gram. An n-gram is a contiguous sequence of n items from text or speech: in this case, we're working with words. If you do a probability model over them, you talk about a probability model from which you can generate words via the frequency that you determine (usually you will be learning this from data, in this case ero visual novels), and each word can be a state. Well, isn't that a Markov chain? Yes, yes it is. Only, it's a semimarkov chain, since the words in English aren't statistically independent from the preceding words. ("the", for example, is more likely before "Guggenheim" and less likely before "betray") So except for the inherent statistical knowledge that would probably never satisfy a philosopher with regards to whether it's knowledge or not, this model has no knowledge of linguistics. You can kick out all the linguists in the building. (Frederick Jelinek of IBM: "Every time I fire a linguist, the performance of the speech recognizer goes up.")

And if n-grams were the best way to model a language and sample from a language ever, I wouldn't be giving you a wall of text. But it's not. Here's some problems.

1. Long-range dependencies. If I have an order-1 markov chain (a 2-gram model, so where

"The dog smelled like a skunk",

You're kind of screwed for modelling "skunk". Because, "smelled like a ___" would be more likely to generate "skunk" than "like a" would be to generate "skunk", right? This is a bit of a poor example, since there's a billion grammatical ways to move your words around every which way, but you get my point.

2. Variance problems.

If you try to solve the above problem by, say, making a 10-gram model, you're going to be screwed. Imagine there's

So what happens if you don't have that much data? A variance problem occurs. That is, your probability model overlearns on the data. If the only word which your probability model has seen in the data after "It was the" was "greatest", then by golly, it's going to put "greatest", even though, say, "ninth" would also fit perfectly well after that n-gram.

Now, given that, we understand pretty well

We also understand why some of the weird shit that this thing generates is so weird.

"and this, and this, and this, and this, and this, and this"

because your probability model probably things that "and this," is really, really likely to follow 'and this," and off it goes into a little loop. Or you get strings of words like,

"in any partickler, no matter how small it is, and your liver shall be tore out, roasted, ate."

Which seems to pretty much be a direct quotation from some horrifying passage. Pretty easy to say it's a variance problem. Direct quotations? Awful. You want something actually random, right?

And if you talked to a statistician about this for a while, you would be able to go into the rabbit-hole of how to solve these two problems. I've heard that people have had lotsa success with hidden Markov models, which are a type of Bayesian net which incorporates Markov models, for modelling long-range dependencies. And there has been a lot of work solving the variance problem, too, using the various methods to smooth probability distributions, to redistribute probabilities so that unseen words can be predicted.

Or you could get more data- a lot more of it. That's the central point behind "Big data": if you're making a model, you want more data. More data is better that less data, damn near always. And the more data, the better in pretty much every way. It's unreasonably effective. I think in this case, this guy's dataset was a few hundred novels, tops, and I could imagine a significantly smaller dataset. This is a mistake.

Tl:dr: it's a really hard problem in natural language processing. So act like it's a hard problem.

posted by curuinor at 10:41 PM on June 2, 2013 [16 favorites]

A finite state machine is a mathematical construct that abstracts the concept of state and transition. State, meaning the current state of an abstract machine, and transitions, which switch you between states. Think of a light bulb: it has two states, on and off, and four transitions: on to off, off to on, keeping it on if it's on, and keeping it off if it's off. You keep track of the current state, and you can say things about the model like: "start with on, and turn it off and turn it back on again". These are discrete timesteps that you go through, with a specific transition and a new state (which may actually be the same state as last timestep, it's ok).

A Markov chain can be thought of as a probabilistic finite state machine. Meaning, there are states, just like in the finite state machine, but the transitions are now defined in terms of probability measures. From Wikipedia, the example of the hypothetical stock market. Imagine that the stock market was a simple machine, like a light-switch with three settings, instead of two: a bull market, a bear market, and a stagnant market.

But you're not

*that*clueless, you know that you can't model the stock market deterministically, so you have to model it probabilistically.So a bull market has a 7.5% chance to get to a bear market, a 2.5% chance to go to a stagnant market, and a 90% chance to stay a bull market after each time step (a probability distribution which says 90% bull, 2.5% stagnant, 7.5% bear, in other words). Or a bear market has a 15% chance to turn into a bull market, a 5% chance to turn into a stagnant market, and a 80% chance to stay a bear market at each time step. And the stagnant market has a 25% chance to turn into a bull market, a 25% chance to turn into a bear market, or a 50% chance to stay stagnant, at each time step.

You can also

*sample*from the probability distribution that this Markov chain represents: "bear, bear, stagnant, bull, bull" for 5 timesteps, "sampling" from the probability distribution at each step. That's pretty cool, and it's the basic intuition behind what they're doing. I can tell because of some things about the text that I'll point out. But I'll keep on talking about the stock market for a few paragraphs, if you'll bear with me.(why do they call it the Markov chain? Andrey Markov thought up of a specific property called the Markov property which could apply to stochastic probability distributions like the one I described about the stock market. It means that the distribution has no memory: if you are in the bull market state, it doesn't matter if you came to the bull market state from the bear market state or the stagnant market state, the probabilities to transition from the bull market state are the same. A semimarkov distribution has a little bit of memory: for a semimarkov distribution of order 1, it does matter that you came to the bull market state from the bear market state, but it doesn't matter that you came to the bull market state from the bear market state from the stagnant market state. For a semimarkov distribution of order 2, it does matter, and so on.)

Now, the stock market is a thing that has its current state, in reality, determined by an immensely complicated probability distribution. What else is a thing that has its current state determined by an immensely complicated probability distribution? We could say, human language.

How does that work? We could look at it in a couple of ways, but a particularly productive way to look at it is via the n-gram. An n-gram is a contiguous sequence of n items from text or speech: in this case, we're working with words. If you do a probability model over them, you talk about a probability model from which you can generate words via the frequency that you determine (usually you will be learning this from data, in this case ero visual novels), and each word can be a state. Well, isn't that a Markov chain? Yes, yes it is. Only, it's a semimarkov chain, since the words in English aren't statistically independent from the preceding words. ("the", for example, is more likely before "Guggenheim" and less likely before "betray") So except for the inherent statistical knowledge that would probably never satisfy a philosopher with regards to whether it's knowledge or not, this model has no knowledge of linguistics. You can kick out all the linguists in the building. (Frederick Jelinek of IBM: "Every time I fire a linguist, the performance of the speech recognizer goes up.")

And if n-grams were the best way to model a language and sample from a language ever, I wouldn't be giving you a wall of text. But it's not. Here's some problems.

1. Long-range dependencies. If I have an order-1 markov chain (a 2-gram model, so where

*n*is 2) and I have a sentence like,"The dog smelled like a skunk",

You're kind of screwed for modelling "skunk". Because, "smelled like a ___" would be more likely to generate "skunk" than "like a" would be to generate "skunk", right? This is a bit of a poor example, since there's a billion grammatical ways to move your words around every which way, but you get my point.

2. Variance problems.

If you try to solve the above problem by, say, making a 10-gram model, you're going to be screwed. Imagine there's

*n*possible words in the English language. 1-gram? You need something on the order of*n*1-grams to fill things up. Drastic oversimplification, but work with me here. 10-gram? You need*n ^ 10*.So what happens if you don't have that much data? A variance problem occurs. That is, your probability model overlearns on the data. If the only word which your probability model has seen in the data after "It was the" was "greatest", then by golly, it's going to put "greatest", even though, say, "ninth" would also fit perfectly well after that n-gram.

Now, given that, we understand pretty well

*why*the randomly generated stuff doesn't make a lick of sense, because there's no knowledge of grammar. A linguist might say, "well, add some grammar, then!" But in the experience of the natural language processing community, the proper thing to do is to take Fred Jelinek's advice and fire the linguist, work on better probability models. Why? It's damn hard to make a grammatical rule non-deterministic. People have tried, and they found out that the bulk of the work goes into making a good probability model anyways.We also understand why some of the weird shit that this thing generates is so weird.

"and this, and this, and this, and this, and this, and this"

because your probability model probably things that "and this," is really, really likely to follow 'and this," and off it goes into a little loop. Or you get strings of words like,

"in any partickler, no matter how small it is, and your liver shall be tore out, roasted, ate."

Which seems to pretty much be a direct quotation from some horrifying passage. Pretty easy to say it's a variance problem. Direct quotations? Awful. You want something actually random, right?

And if you talked to a statistician about this for a while, you would be able to go into the rabbit-hole of how to solve these two problems. I've heard that people have had lotsa success with hidden Markov models, which are a type of Bayesian net which incorporates Markov models, for modelling long-range dependencies. And there has been a lot of work solving the variance problem, too, using the various methods to smooth probability distributions, to redistribute probabilities so that unseen words can be predicted.

Or you could get more data- a lot more of it. That's the central point behind "Big data": if you're making a model, you want more data. More data is better that less data, damn near always. And the more data, the better in pretty much every way. It's unreasonably effective. I think in this case, this guy's dataset was a few hundred novels, tops, and I could imagine a significantly smaller dataset. This is a mistake.

Tl:dr: it's a really hard problem in natural language processing. So act like it's a hard problem.

posted by curuinor at 10:41 PM on June 2, 2013 [16 favorites]

I think I can summarize what

Well, you get the idea. I hope this has cleared things up a bit. Python 2.7 source, output edited slightly. Please be kind; I hacked this together on a whim and it's not very Pythonic.

posted by JHarris at 12:05 AM on June 3, 2013 [3 favorites]

**curuinor**is saying, and using the very principles he describes:*The following is a bear market state as a direct quotation from the example, even though, "sampling" from the inherent statistical knowledge of a simple machine is a stagnant, bull" for the probability model the stock market state determined by an immensely complicated probability model has no memory: if you determine (usually you do they found out, it. Drastic oversimplification, "in any partickler, I could look at each time I can say things about the transitions, there has its current state of two states, damn hard problem in reality, "in any partickler, using the probabilities to model from the stagnant market state are states, given that less data is, I have a philosopher with a bull market was a little loop. Because, but it off and off and you came to stochastic probability model, and this" because your probability distribution has seen in the n-gram. So what they're doing. Pretty easy to redistribute probabilities to the bear market state machine. And there has a light-switch with regards to on the stagnant, and less data, this, even though*...Well, you get the idea. I hope this has cleared things up a bit. Python 2.7 source, output edited slightly. Please be kind; I hacked this together on a whim and it's not very Pythonic.

posted by JHarris at 12:05 AM on June 3, 2013 [3 favorites]

*Tl:dr: it's a really hard problem in natural language processing. So act like it's a hard problem.*

I can't help but feel this is sort of missing the point. If the point were to have output that seemed as much as possible like something a real person would say or write, sure, but I think often the charm of these kinds of Markov text projects is in their kooky nonsensical-ness.

posted by juv3nal at 12:45 AM on June 3, 2013

I love this. Computer-generated poetry is a favorite thing of mine.

Also, what is it with the schools in visual novels and eroge games? Why do they all have such nice campuses? I'm so jealous of them.

posted by NoraReed at 1:51 AM on June 3, 2013 [1 favorite]

Also, what is it with the schools in visual novels and eroge games? Why do they all have such nice campuses? I'm so jealous of them.

posted by NoraReed at 1:51 AM on June 3, 2013 [1 favorite]

"Oh crap, I'm in a receptacle!"

Miko-chan, I love you because of how wonderfully articulate you are, and how you are able to take your clothes on and off through sheer force of will and they just fade away

posted by NoraReed at 1:57 AM on June 3, 2013

Miko-chan, I love you because of how wonderfully articulate you are, and how you are able to take your clothes on and off through sheer force of will and they just fade away

posted by NoraReed at 1:57 AM on June 3, 2013

*I can't help but feel this is sort of missing the point. If the point were to have output that seemed as much as possible like something a real person would say or write, sure, but I think often the charm of these kinds of Markov text projects is in their kooky nonsensical-ness.*

You're right! I honestly think plain-old Markov Chain whatevers are a bit worn out, but sticking one in a visual novel is clever. It might have been more fun if she used source texts from visual novel scripts, though, perhaps with different generators depending on the variety of scene.

*Which seems to pretty much be a direct quotation from some horrifying passage.*

Great Expectations, actually.

The real fix to the scarceness problem isn't (just) big data, which used naively would still leave holes or even exaggerate the overlearning effect. Using smoothing would keep it weird but cut down on just pulling big chunks from source texts - a+1 is no good for real experiments but should be Good Enough for something like this.

On top of that amassing a big corpus on your own is something of a tedious and fraught enterprise without a bunch of money or backing, even if you limit yourself to the public domain, but for something like this the Google Books Ngram data should come in handy.

posted by 23 at 5:00 AM on June 3, 2013

*I can't help but feel this is sort of missing the point. If the point were to have output that seemed as much as possible like something a real person would say or write, sure, but I think often the charm of these kinds of Markov text projects is in their kooky nonsensical-ness.*

On the other hand, until we have a Markov generator that can pass the Turing Test, the output is going to be weird and wrong in some way. I think better generators can lead to more subtly wrong texts and deeper uncanny valleys. Although since it's not completely interactive, it might be within our current state-of-the-art to generate texts indistinguishable from boring, bad writing.

posted by straight at 8:00 AM on June 3, 2013

This is important in my field: we do text prediction programmes for people with problems typing. Either poor writing (dyslexia, illiteracy, foreign language) or physical disability (no arms, involuntary/little movement, so very slow typing speed.)

A fun game is to take one of the programmes and start typing, take every default suggestions, and use the output see if you can work out what original book the company downloaded from Gutenberg to train their prediction mechanism!

Another application is also spellchecking, where most spelling errors are not mispeled wurds lyke thiss, but words were there spelling is write but not four that context. Huge, enormous data sets let you spot these errors and fix them. It's one situation where "the cloud" really does make sense.

posted by alasdair at 10:44 AM on June 3, 2013 [1 favorite]

A fun game is to take one of the programmes and start typing, take every default suggestions, and use the output see if you can work out what original book the company downloaded from Gutenberg to train their prediction mechanism!

Another application is also spellchecking, where most spelling errors are not mispeled wurds lyke thiss, but words were there spelling is write but not four that context. Huge, enormous data sets let you spot these errors and fix them. It's one situation where "the cloud" really does make sense.

posted by alasdair at 10:44 AM on June 3, 2013 [1 favorite]

Oh, I can also confirm what

posted by alasdair at 10:49 AM on June 3, 2013

**curuinor**said above: we've tried cunning grammatical and phonetic mechanisms, but More Data is usually better...posted by alasdair at 10:49 AM on June 3, 2013

*I honestly think plain-old Markov Chain whatevers are a bit worn out*

I think better generators can lead to more subtly wrong texts and deeper uncanny valleys.

I think better generators can lead to more subtly wrong texts and deeper uncanny valleys.

I concede my bar for interestingness may be a bit low. I am still sometimes amused by those poison-the-Bayes-well spam emails that go around.

posted by juv3nal at 11:07 AM on June 3, 2013 [1 favorite]

« Older Traumatized by tonight's episode of Game of... | How the Online World Is Fighting the Next Pandemic Newer »

This thread has been archived and is closed to new comments

posted by curuinor at 8:13 PM on June 2, 2013 [1 favorite]