OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Paper • 2501.04561 • Published Jan 8 • 16 • 4
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation Paper • 2502.03930 • Published 5 days ago • 1