Google DeepMind's WaveNet gets closer to replicating human speech

Publish date: 2022-07-13

But despite all these improvements, AI assistants are far from sounding like real humans. AI speech sounds very robotic, something that is vastly being improved by WaveNet, Deepmind’s new AI that can mimic human speech. It is not perfect, but is 50% better than current technologies. In fact, the thing is so smart it can create its own music, after learning various classical piano songs.

You can listen to some samples in DeepMind’s blog post. They are really quite impressive, but you likely won’t be seeing this hit the market soon, mostly because it requires too much computing power.

Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.

For those out of the loop, Deepmind was acquired by Google in 2014 for $500 million. The Google-owned company’s system tries to mimic how the human mind works. It can be trained to learn information and has been known to beat Go champions, a great accomplishment considering this has been long known to be a distinctly human game.

ncG1vNJzZmivp6x7orrDq6ainJGqwam70aKrsmaTpLpws86onqWdXZmyprzMoqWdq12srrexzZ6rZqqVpbmqr8CtoKefXZ3Crq3NZqqpnZWYtW6DkG5vcG1f