Neural Voice Cloning with Few Samples

6 0
  • 0 Collaborators

Implementation of Neural Voice Cloning with Few Samples project. And implementation of efficient multi-speaker speech synthesis on Tacotron-2 ...learn more

Project status: Published/In Market

Artificial Intelligence

Intel Technologies
Intel Python

Code Samples [1]Links [1]

Overview / Usage

The problem being solved is efficient neural voice Synthesis of a person’s Voice given only a few samples of his Voice. Current methods either rely heavily on a lot of data or an not good enough. We aim to solve this by building an encoder which first captures a person’s speech characteristic by encoding his voice In a high dimensional latent space. Then a voice generator generates voice conditioned on this high dimensional vector.

Methodology / Approach

A speaker encoder is developed consisting of an architecture of 1 dimensional convolutions followed by Multi head attention. The other architecture is a LSTM based recurrent speaker encoder. These two encoders embed important speaker characteristics of an individual in a high dimensional latent space. This vector is then taken and a generative model conditioned on this vector generates a speech very similar to original person’s Voice.

Technologies Used

Python, pytorch, librosa, GCP, AWS

Repository

https://github.com/Sharad24/Neural-Voice-Cloning-with-Few-Samples

Comments (0)