Remixing just got easier
November 5, 2019 11:31 AM   Subscribe

The engineering team behind streaming music service Deezer just open-sourced Spleeter, their audio separation library built on Python and TensorFlow that uses machine learning to quickly and freely separate music into stems [their component tracks]. ... But how are the results? I tried a handful of tracks across multiple genres, and all performed incredibly well.
posted by Sokka shot first (35 comments total)

This post was deleted for the following reason: Poster's Request -- Brandon Blatcher



 
The newest version of iZotope Ozone has a "master rebalance" feature that lets you rebalance the elements of a stereo mixdown. I'd been thinking that whatever technology is behind that could be used to split the stems out.

This is pretty much that, and seems to work shockingly well, given the complexity of the task. Very impressive.
posted by uncleozzy at 12:03 PM on November 5, 2019 [2 favorites]


I wonder whether this could be used to split a choral recording into sections. If so, it would be really helpful to people learning/studying choral pieces!
posted by HoraceH at 12:07 PM on November 5, 2019 [2 favorites]


It's not perfect, but it's pretty damned impressive. Easily sufficient for most non-professional uses, I think.

As for choral pieces, looks like you'd have to do a lot of training to generate a model for that. The built-in models only support up to 5 "tracks", and all of those models assume that vocals will be on a single track. But if you had a sufficient corpus to train it on, it might be able to separate out the choral parts eventually.
posted by tobascodagama at 12:26 PM on November 5, 2019 [4 favorites]


It works surprisingly well, but man those transient noises and some of the weird artifacts going on will make this a dealbreaker for extensive remixing, at least for a little while.

I was doing some live rough sync mixing with the waxy.org examples right in the browser and I can hear the weird phase/remainder issues going on, hearing remnants from somewhere in the algorithmic middle between the two tracks.

So if you sync these stems up nearly perfectly there's likely going to be some phase/phaser effects like you get with a heavy delay or playing the same stem/track twice with a sync.

The techno nerd in me wants to feed this algorithm some beats and stuff and see what sounds you can get out of those middle-ground artifacts and hack it into it's own sound like people did with Autotune abuse.
posted by loquacious at 12:36 PM on November 5, 2019 [12 favorites]


Excellent! I'll finally be able to mash up Hexonxonx and I Kissed a Girl.
posted by suetanvil at 12:40 PM on November 5, 2019 [4 favorites]


Cool! This does seem ripe for abuse, which is at least as exciting as actually splitting out beats or what-have-you.
posted by aspersioncast at 12:41 PM on November 5, 2019 [1 favorite]


Karaoke night will never be the same!
posted by jenkinsEar at 12:46 PM on November 5, 2019 [4 favorites]


Oh dear I'm mucking about in my terminal with docker shit trying to install this. I want to throw my noise/ambient stuff at it and see what it does to 20 minutes of birds and synths.
posted by loquacious at 12:47 PM on November 5, 2019 [11 favorites]


I installed and ran it the other day and it handles the supplied mp3 file just fine. I need to see it with some other files and more than 2 stems before I make a final decision. It's definitely something a lot of music students would appreciate though it might make some copyright folks a little nervous.
posted by tommasz at 12:47 PM on November 5, 2019


I wonder how much of a difference it makes whether the input was compressed using a lossy psychoacoustic model (i.e., MP3/AAC/OGG) or lossless (WAV/FLAC direct from the CD).
posted by acb at 12:52 PM on November 5, 2019 [1 favorite]


Also: this may be just the second wind the vaporwave genre needs.
posted by acb at 12:56 PM on November 5, 2019 [4 favorites]


Mash-ups are back! And now we have this Lemonade/Hurt abomination
posted by horopter at 1:22 PM on November 5, 2019 [3 favorites]


Holy shit this is cool. So nice to have a cool bit of the Gibsonian future show up for once amongst the ten thousand horrifying ones...
posted by ominous_paws at 1:34 PM on November 5, 2019 [4 favorites]


Mash-ups are back!

Relatedly, Neil Cicierega announced a new upcoming album at this year's XOXO.
posted by rorgy at 2:01 PM on November 5, 2019 [6 favorites]


Be still my beating heart.
posted by Manic Pixie Hollow at 2:09 PM on November 5, 2019


The techno nerd in me wants to feed this algorithm some beats and stuff and see what sounds you can get out of those middle-ground artifacts and hack it into it's own sound like people did with Autotune abuse.

You might be interested in playing with audio style transfer.
posted by STFUDonnie at 2:12 PM on November 5, 2019 [3 favorites]


rorgy: Neil not only announced it, he previewed the entire work-in-progress album for the first time.
posted by waxpancake at 3:16 PM on November 5, 2019 [1 favorite]


I'm ok with the artifacts; modern pop and edm already has so much glitch, autotune, etc. effects that a little bit of phasing and swooshing is not gonna hurt. Especially if you use this in a more bomb-squad / DJ Spooky style where you only take bits and are doing more as a collage than pure remix. Which I think is the best use of this tool-- it opens every recording to sampling creatively from ANY PART OF THE TRACK, not just the bits where there are no vocals, or just vocals, or the bass line was exposed, or the drum loop was exposed, etc. It's ALL up for grabs now.

Suits gonna hate it, and I've taken a no-sampling rule to heart the past 10+ years because the legal risk wasn't worth it, but this is TOO GOOD to ignore now.
posted by cape at 3:49 PM on November 5, 2019 [6 favorites]


I was thinking about experimenting with this soon so I'm glad it runs so fast on CPU. Does anyone have any resources for how to manually repair some of the high pitched chirping artifacts it makes? They sound a lot like what would happen with a bad mp3 encode, can you do anything with just some EQ?
posted by JZig at 4:17 PM on November 5, 2019


Karaoke night will never be the same!

Yeah, even if it's not good enough for remixing quite yet, this seems just good enough to allow you to take any audio track, match it to timed scrolling lyrics, and make instant Karaoke out of anything.
posted by codacorolla at 4:37 PM on November 5, 2019 [1 favorite]


I played with this waaaaaay too much the other night. It's scarily good at separating vocals, and pretty good at drums, but it often can't pick out the bass line (to be fair, it's sonically similar to a clean guitar, and you just have to know how bass lines are phrased). But the "other" track that contains the guitars and other instruments is cool. It doesn't consider the entire track, just 11-seconds at a time or so, so there's a lot of flanging and phasing effects as the network adjusts its filters.

It does expose how darn much modern music is compressed... in many songs the drums and guitars just pump and wheeze. But GBV's Alien Lanes, which was recorded on a Tascam 4-track, doesn't seem to do this, because it was probably heavily compressed during recording and just mixed with faders.

But you can easily get a karaoke track for "Who Are the Brain Police?" so total win.
posted by RobotVoodooPower at 5:22 PM on November 5, 2019 [2 favorites]


will miss those odd Karaoke videos of mimes walking along the boardwalk...
posted by ovvl at 6:36 PM on November 5, 2019


Ooh. Make this happen!
posted by klausman at 7:17 PM on November 5, 2019


They sound a lot like what would happen with a bad mp3 encode, can you do anything with just some EQ?

If we're hearing the same thing, it sounds like aliasing (or maybe filter resonance?) to me. Don't know enough about mp3s to know if that's the same. I bet surgical EQing could work on some of them, but it would be fiddly as hell.

Something like RX might help, but it can still struggle if you're dealing with a wide-band noise overlapping with vocals (or guitars, or snares...).
posted by Mike Smith at 7:51 PM on November 5, 2019


Nine Inch Nails has a zillion tracks for which the actual stems are available. It would be interesting to throw the studio mixes of songs at this and compare them to the actual various tracks.
posted by hippybear at 8:24 PM on November 5, 2019 [1 favorite]


You could try Zynaptiq UnChirp. It's expensive but it's designed to fix these types of things (compression artifacts, which these sound a lot similar to). That swooshing, phasing, sound. I haven't tested on this stuff but it might be a quick fix that works well enough.

Nine Inch Nails has a zillion tracks for which the actual stems are available. It would be interesting to throw the studio mixes of songs at this and compare them to the actual various tracks.

Also a good way to continue to train the model. I don't know how many tracks they used (stems from rockband / guitar hero are another good source) but I saw in the docs you can continue to train the ML.
posted by cape at 9:38 PM on November 5, 2019


But continuing to train an AI is how Skynet emerges....
posted by hippybear at 9:53 PM on November 5, 2019 [1 favorite]


Nice thing — I tried it on Glutton of Sympathy by Jellyfish (alt rock tune with four vocal parts and the usual set of instruments). It definitely separated the vocals into one track, though the volume was up and down a bit through the track. Thanks for the great post!
posted by klausman at 10:17 PM on November 5, 2019 [2 favorites]


Holy shit, I literally just predicted this last weekend. Quick, ask me who will win the election!
posted by Acey at 12:55 AM on November 6, 2019


Nine Inch Nails has a zillion tracks for which the actual stems are available. It would be interesting to throw the studio mixes of songs at this and compare them to the actual various tracks.

If the model was trained with stems and final mixes as inputs, perhaps adding more stems and retraining it would end up in further improving it.
posted by acb at 2:10 AM on November 6, 2019


Such soft can be fun as well :)
posted by ElliotChang at 2:35 AM on November 6, 2019


Interesting stuff. I just tried it on some early Numan and Eno. The Numan was quite clean, interestingly it had noticably different results for two choruses that sounded identical to me, and it struggled a bit on keeping the vocals clean leaving Gazza apparently standing in a field of quiet robot cicadas.

I wanted to see what it made of Fripp's guitar solo in Baby's On Fire. An unholy mess... but then it seems to make drums, bass, vocal and other, where other is what's left over. I haven't gone into it enough to see if there's any precooked training sets for rawk geetar plankspankery removal...

But lots of fun. There are quite a lot of songs in my library which would be really nice abnient tracks if some eejit wasn't warbling over the top of them.
posted by Devonian at 3:52 AM on November 6, 2019


Hey, somebody feed it all The Roches albums and see what it produces.
posted by wenestvedt at 6:07 AM on November 6, 2019 [3 favorites]


How to Install Spleeter for Newbies (of which I am totally one, and although I might be wrong about some things this process worked for me):
1) Download the installer for Git that matches your computer (Windows/Mac/Linux). Install Git.
[Note: Git is like a command line environment for a billion things. Spleeter runs from Git Bash, which is a command line shell]
2) Download the installer for Miniconda (which is a bunch of libraries for machine learning stuff, I think). Install the Python 3.7 version of Miniconda.
This site is a helpful and detailed walkthrough of these first two steps

3) Next, open a Git Bash shell. On my Windows machine it's just a program you can run from the Start menu.
4) Copy and paste the following commands into the command line:
git clone https://github.com/Deezer/spleeter
conda env create -f spleeter/conda/spleeter-cpu.yaml


5) After each command hit enter. It will download and configure a bunch of stuff. Have a cup of tea while you wait.
On my machine, I had to type the following or else nothing worked right:
conda init bash

6) After that, quit the Git Bash shell and then re-open it. Now copy-paste this:
conda activate spleeter-cpu
If you have a powerful GPU in your computer you may want to try doing this command instead, but I haven't tried it. It will use the GPU instead of the CPU to do the audio processing, and will be way way faster:
conda activate spleeter-gpu

7) After activation, your Spleeter environment is now set up! You can test it on an included demo file by running this command:
spleeter separate -i spleeter/audio_example.mp3 -p spleeter:2stems -o output

It took me a minute to find where the output went, on my Windows machine it put it in C:\Users\Enkidude\output\audio_example
posted by Enkidude at 8:18 PM on November 7, 2019 [8 favorites]


I just ran the opening of A Hard Day's Night through it, and it split it up splendidly into George Martin's Steinway, Paul's bass, George's twelve string and John's lead. So now we know!

Unfortunately, this tweet doesn't have space for me to put the files.
posted by Devonian at 4:34 AM on November 8, 2019 [2 favorites]


« Older Gimme some money   |   The touchdown did not count, of course Newer »


This thread has been archived and is closed to new comments