Text-to-Speech
English

How does speed up work?

#110
by yukiarimo - opened

Hello. How is it possible to make 30 minutes of audio in one minute with Kokoro (or something like that)? It is based on StyleTTS-2 and has 82M parameters (I'm not sure how much StyleTTS is), but still, is it just the size, or was there some magic trick with NN or GPU you did?

Is it possible to implement the same speed optimization in MeloTTS?

Sign up or log in to comment