Audio Conditioned LipSync with Latent Diffusion Models
Efficient T2V generation
Quickly edit the expression of a face