97 6

Hexgrad PRO

hexgrad

https://hf.co/hexgrad/Kokoro-82M

hexgrad

AI & ML interests

Building in a cave with a box of scraps

Recent Activity

new activity about 21 hours ago

TTS-AGI/TTS-Arena:Add Zonos, 1.6B Apache TTS Model

updated a Space 1 day ago

hexgrad/Misaki-G2P

updated a Space 1 day ago

hexgrad/Kokoro-TTS

View all activity

Organizations

None yet

hexgrad's activity

New activity in TTS-AGI/TTS-Arena about 21 hours ago

Add Zonos, 1.6B Apache TTS Model

#82 opened about 21 hours ago by

hexgrad

updated 2 Spaces 1 day ago

Misaki G2P

⚡

G2P

1.98k

Kokoro TTS

❤

Upgraded to v1.0!

New activity in hexgrad/Kokoro-82M 1 day ago

About Korean Ai hub Dataset

#119 opened 1 day ago by

sangwon1472

New activity in hexgrad/Kokoro-82M 2 days ago

any plans to support Arabic?

#115 opened 2 days ago by

mzeid

New activity in hexgrad/Kokoro-82M 4 days ago

Upload 8a9e66fc-1832-442b-825b-8c54769f51d5.mov

#113 opened 4 days ago by

HorlarReigns

"Why does it show this error: 'Error: Failed to generate audio' when inputting Chinese?"

#112 opened 4 days ago by

BangQuan

can you use a Style-TTS2 network in kokoro?

#111 opened 4 days ago by

hugeingface

reacted to Xenova's post with 🔥 4 days ago

Post

5068

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

7 replies

posted an update 4 days ago

Post

4191

Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy

updated a dataset 5 days ago

hexgrad/misaki

Updated 5 days ago • 123 • 1

New activity in hexgrad/Kokoro-82M 5 days ago

Can I replace the Inference Widget backend on this model page with a Gradio Spaces API?

#104 opened 7 days ago by

hexgrad

We need to get this somewhere between XTTS and Kokoro

#93 opened 12 days ago by

jattoedaltni

posted an update 6 days ago

Post

5478

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.

published an article 6 days ago

Article

G2P Shrinks Speech Models

•

6 days ago

• 23

New activity in hexgrad/Kokoro-TTS 6 days ago

when it will be available for open source community

#1 opened 3 months ago by

arpitsh018

replied to Keltezaa's post 7 days ago

I am considering canceling my Pro subscription because I have been banned from posting an Article for many weeks now with no explanation or recourse.

Also, the ability to Post and the Posts feed are vandalized by those AI slop posts where the OP runs all 12 reactions on their own post and uses alt accounts to do the same. And I have no ability to block these circlejerking accounts.

New activity in hexgrad/Kokoro-TTS 7 days ago

Update README.md

#18 opened 9 days ago by

Meroar

updated 2 Spaces 7 days ago

1.98k

Kokoro TTS

❤

Upgraded to v1.0!

1.98k

Kokoro TTS

❤

Upgraded to v1.0!