Just don't play the samples backwards
September 8, 2016 2:38 PM Subscribe
WaveNet: text to speech using a generative deep learning model. Existing text-to-speech systems use parametric generation or a concatenative approach where tiny samples of a recorded voice are strung together to create synthesized speech. Using a deep learning technique WaveNet generates synthetic speech a single sample at a time. Especially interesting: "If we train the network without the text sequence, it still generates speech, but now it has to make up what to say. As you can hear from the samples below, this results in a kind of babbling, where real words are interspersed with made-up word-like sounds" The academic paper.
This thread has been archived and is closed to new comments