![](https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/mD7gobrnznzDXnpZ9ZiT8.png)
Alpha-VLLM/Lumina-Video-f24R960
Text-to-Video
β’
Updated
β’
20
Best message I have seen, I am literally tearing up.
Yeah Kokoro uses phonemes instead of direct text, thatβs why itβs very good quality at just 82m params and can pronounce words better then other massive tts models(even better then Llasa 8b).
Only problem is emotion, which Llasa 8b is much better at doing.
rzvzn
: https://discord.gg/QuGxSWBfQy
2048x2048
. If your images are mostly larger than 1024x1024
, use BiRefNet_HR for better results! Thanks to
@Freepik
for the kind support of H200s for this huge training.1024x1024
on val set: