[ RIFFUSION ] (noun): riff + diffusion
December 17, 2022 12:35 PM   Subscribe

Stable Diffusion can generate images from text. Spectrograms are graphical representations of audio. Riffusion mixes the two. rock and roll electric guitar solo, lo-fi hiphop beats, the sound of metafilter
posted by simmering octagon (17 comments total) 13 users marked this as a favorite
 
This gives me the same uncanny feeling as an AI portrait where the facial features are slightly off-kilter.

Also, it was interesting how wildly the results varied. "Mambo but from Kenya" was identifiably similar in style to African mambo, while "ancient chinese hymn" sounded like the background music to some kind of lo-fi nightmare.
posted by Kutsuwamushi at 12:50 PM on December 17, 2022


This is the thing I've been poking around stable diffusion and AI generation to get to. Not *this* exactly, it's uncanny and soulless but it's got some ver clever parts that will get better until it's not soulless - or at least not detectably so.

This will be nifty if Trent and Atticus start playing with it, noise into music into organic screeching and so on.
posted by abulafa at 1:00 PM on December 17, 2022


Its interpretation of "church bells" sounds like a bunch of campanologists who are fans of modern jazz fusion decided to try to play a gendhing.
posted by biogeo at 1:17 PM on December 17, 2022


Also they were drunk when they made this decision.
posted by biogeo at 1:18 PM on December 17, 2022


Man, this is hilariously weird sometimes. I'm having it genre shift through a bunch of various traditional/folk type prompts, and it really seems to get stuck in local minima that are almost but not entirely unlike the prompts. Like at a certain point, a saxophonist who's a big fan of John Coltrane but doesn't have his talent just showed up and he won't leave.
posted by biogeo at 2:10 PM on December 17, 2022 [1 favorite]


Weird, I guess they break songs into measures using some beat detection process, then interpolate the measures? Seems like the tempo is fixed. Infinite hold music of the damned.

It does a good Phil Collins impression though.
posted by credulous at 2:44 PM on December 17, 2022


And we also have the Auto-Björkifizer for those that celebrate.
posted by credulous at 2:52 PM on December 17, 2022 [1 favorite]


I don't know what I'm listening to, but it's not a sea shanty. Not from any sea on this planet, anyway.
posted by Faint of Butt at 2:57 PM on December 17, 2022 [1 favorite]


It always starts with some preset beat's spectrogram (click the gear in the upper right to change it), which is why everything tries to conform to that beat. If you set the denoising to 0.95 you get much more interesting results.
posted by Pyry at 3:03 PM on December 17, 2022 [3 favorites]


It is mostly fascinating that this works at all; inverting a spectrogram with Griffin-Lim always sounds kinda terrible, which is likely the main source of the slurriness in the audio. It should work much better to feed the spectrograms to a stronger inverter, like WaveRNN or a Soundstream. That's basically how text to speech works these days: a text to spectrogram model, followed by a spectrogram inverter to get the best quality audio.
posted by kaibutsu at 3:41 PM on December 17, 2022


1. set denoising to something random (0.85-0.95)
2. type in 'honk'
3. walk away
posted by suckerpunch at 4:29 PM on December 17, 2022 [3 favorites]


They need a windowing function to smooth out that jarring transition between frames. Kinda interesting tho, and a simpler approach than that OpenAI Jukebox thing. Maybe the Endless Smooth Jazzulator is just a few years away.
posted by credulous at 5:27 PM on December 17, 2022


It seems to be worse at giving you what you ask for than visual art AI, if that's possible. I don't feel like my nonexistent livelihood as a musician is threatened...
posted by Foosnark at 5:32 PM on December 17, 2022


I think it handles washing machine pretty well.
posted by The Great Big Mulp at 5:43 PM on December 17, 2022


So much of what Riffusion comes up with sounds vaguely gamelanesque to me. Actually, it does a pretty good job if you go with that and do gamelan fusion genres. . .
posted by DrMew at 10:19 PM on December 17, 2022


DrMew, I had the same thought. I have a suspicion it might happen because the spectral content of struck metal is somewhat different than other instruments (having to do with the vibrating surface being essentially 2-dimensional rather than 1-dimensional as with strings or air columns if I recall correctly), and once it generates an appropriate spectrum for a bell or vibraphone or whatever, it starts to "riff" off of the harmonics as if they were candidates for fundamental notes, the way they would be for a string or wind instrument. This might end up sounding kind of like a slendro or pelog scale (at least to people like me who aren't completely steeped in that musical tradition), giving it that gamelan sound. That combined with the way it cycles through patterns (with modification) also gives it sort of a flair for one of the gamelan gendhings (again to the ear of someone not deeply immersed in that tradition). But I'm totally just spitballing and am speculating well beyond my depth of knowledge, regarding both the tech of how Riffusion is working and gamelan music theory.

On the one hand, I actually like the sound of it when it gets into a gamelan-like groove. On the other hand, I'd enjoy listening to actual gamelan music more.
posted by biogeo at 1:10 AM on December 18, 2022


I tried "Tom Waits synthwave* and *apocalyptic doom country", and they sounded weirdly similar.
posted by Mr. Bad Example at 6:18 AM on December 18, 2022


« Older Young farmers argued with elders about which song...   |   On the Internet, No One Knows Derek is a Dog Newer »


This thread has been archived and is closed to new comments