a ZeT@sddlZddlZddlmZddlmZddlZddlm Z ddl m Z ddl m Z ddlmZddlmZGd d d e jZdS) N)Path)Union)nn)CS_API)save_wav) ModelManager) Synthesizerc seZdZdZd?eeeeeeedfdd Zed d Zed d Z eddZ eddZ eddZ eddZ eddZddZedddZd@eedddZdAeedd d!ZdBeeeeed"d#d$ZdCeeeeedd%d&d'ZdDeeeeeeeejefd)d*d+ZdEeeeeeed,d-d.ZdFeeeeeeed0d1d2Zeed3d4d5ZdGeeed6d7d8ZdHeeed9d:d;ZdIeeeed<d=d>Z Z!S)JTTSz2TODO: Add voice conversion and Capacitron support.NTXTTSF) model_name model_path config_path vocoder_pathvocoder_config_path progress_bar cs_api_modelc stt||dd|_d|_d|_d|_||_d|_ |rJt d|durd|vsbd|vrp| ||nd|vr| |||r|j|||||d dS) u+ 🐸TTS python interface that allows to load and use the released models. Example with a multi-speaker model: >>> from TTS.api import TTS >>> tts = TTS(TTS.list_models()[0]) >>> wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0]) >>> tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav") Example with a single-speaker model: >>> tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False, gpu=False) >>> tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path="output.wav") Example loading a model from a path: >>> tts = TTS(model_path="/path/to/checkpoint_100000.pth", config_path="/path/to/config.json", progress_bar=False, gpu=False) >>> tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path="output.wav") Example voice cloning with YourTTS in English, French and Portuguese: >>> tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False, gpu=True) >>> tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="thisisit.wav") >>> tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="thisisit.wav") >>> tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="thisisit.wav") Example Fairseq TTS models (uses ISO language codes in https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html): >>> tts = TTS(model_name="tts_models/eng/fairseq/vits", progress_bar=False, gpu=True) >>> tts.tts_to_file("This is a test.", file_path="output.wav") Args: model_name (str, optional): Model name to load. You can list models by ```tts.models```. Defaults to None. model_path (str, optional): Path to the model checkpoint. Defaults to None. config_path (str, optional): Path to the model config. Defaults to None. vocoder_path (str, optional): Path to the vocoder checkpoint. Defaults to None. vocoder_config_path (str, optional): Path to the vocoder config. Defaults to None. progress_bar (bool, optional): Whether to pring a progress bar while downloading a model. Defaults to True. cs_api_model (str, optional): Name of the model to use for the Coqui Studio API. Available models are "XTTS", "V1". You can also use `TTS.cs_api.CS_API" for more control. Defaults to "XTTS". gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False. FZ models_filerverboseNr z>`gpu` will be deprecated. Please use `tts.to(device)` instead.Z tts_models coqui_studioZvoice_conversion_models)rvocoder_configgpu)super__init__rget_models_file_pathmanager synthesizervoice_convertercsapirr warningswarnload_tts_model_by_nameload_vc_model_by_nameload_tts_model_by_path) selfr r rrrrrr __class__8/home/shadhil/imcapsule/voice_clone/coqui-tts/TTS/api.pyrs$1    z TTS.__init__cCs |jSN)rlist_tts_modelsr$r'r'r(modelsZsz TTS.modelscCs,t|jjdr(|jjjr(|jjjjdkSdS)Nspeaker_managerF)hasattrr tts_modelr-Z num_speakersr+r'r'r(is_multi_speaker^szTTS.is_multi_speakercCs|jdurdSd|jvS)NFrr r+r'r'r(is_coqui_studiods zTTS.is_coqui_studiocCsFt|jtrd|jvrdSt|jjdrB|jjjrB|jjjjdkSdS)NZxttsTlanguage_managerr.F) isinstancer strr/rr0r4Z num_languagesr+r'r'r(is_multi_lingualjs zTTS.is_multi_lingualcCs|js dS|jjjjSr))r1rr0r-Z speaker_namesr+r'r'r(speakersssz TTS.speakerscCs|js dS|jjjjSr))r7rr0r4Zlanguage_namesr+r'r'r( languagesysz TTS.languagescCsttjdS)Nz .models.json)r__file__parentr'r'r'r(rszTTS.get_models_file_pathc Cshzt|jd}|}Wn0tyH}zt|g}WYd}~n d}~00ttddd}||S)N)modelFr) rrZlist_speakers_as_tts_models ValueErrorprintrr rr*)r$rr,err'r'r( list_modelss  zTTS.list_modelsr2cCs~|j|\}}}d|vs0|dur>t|dtr>dddd|fS|ddurZ||dddfS|j|d\}}}||||dfS)NZfairseqZ model_urlZdefault_vocoder)rZdownload_modelr5listget)r$r r rZ model_itemrr_r'r'r(download_model_by_nameszTTS.download_model_by_name)r rcCs.||_||\}}}}}t|||d|_dS)aLoad one of the voice conversion models by name. Args: model_name (str): Model name to load. You can list models by ```tts.models```. gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False. )Z vc_checkpoint vc_configuse_cudaN)r rDrr)r$r rr rrCr'r'r(r"szTTS.load_vc_model_by_namec CsZd|_d|_||_d|vr$t|_n2||\}}}}}t||dd||dd||d |_dS)uLoad one of 🐸TTS models by name. Args: model_name (str): Model name to load. You can list models by ```tts.models```. gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False. TODO: Add tests Nr) tts_checkpointtts_config_pathtts_speakers_filetts_languages_filevocoder_checkpointrencoder_checkpointencoder_config model_dirrF)rrr rrDr)r$r rr rrrrNr'r'r(r!s(  zTTS.load_tts_model_by_name)r rrrrc Cs t||dd||dd|d |_dS)aLoad a model from a path. Args: model_path (str): Path to the model checkpoint. config_path (str): Path to the model config. vocoder_path (str, optional): Path to the vocoder checkpoint. Defaults to None. vocoder_config (str, optional): Path to the vocoder config. Defaults to None. gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False. N) rGrHrIrJrKrrLrMrF)rr)r$r rrrrr'r'r(r#s zTTS.load_tts_model_by_path)speakerlanguage speaker_wavemotionspeedreturncKs|js|jr$|dur$|dur$td|jr:|dur:td|jsX|durXd|vrXtd|jsn|durntd|dur|durtdnh|durd}|durd }|durtd |durtd |dur|d krtd |dvrtd|ddS)z/Check if the arguments are valid for the model.Nz4Model is multi-speaker but no `speaker` is provided.z5Model is multi-lingual but no `language` is provided.Z voice_dirz5Model is not multi-speaker but `speaker` is provided.z6Model is not multi-lingual but `language` is provided.z` - must be one of `Neutral`, `Happy`, `Sad`, `Angry`, `Dull`.)r3r1r=r7)r$rOrPrQrRrSkwargsr'r'r(_check_argumentss.  zTTS._check_argumentsrV)text speaker_namerPrRrS file_pathrTc CsP|jdd}|dur6|jj|||||||ddS|jj|||||ddS)aConvert text to speech using Coqui Studio models. Use `CS_API` class if you are only interested in the API. Args: text (str): Input text to synthesize. speaker_name (str, optional): Speaker name from Coqui Studio. Defaults to None. language (str): Language of the text. If None, the default language of the speaker is used. Language is only supported by `XTTS` model. emotion (str, optional): Emotion of the speaker. One of "Neutral", "Happy", "Sad", "Angry", "Dull". Emotions are only available with "V1" model. Defaults to None. speed (float, optional): Speed of the speech. Defaults to 1.0. pipe_out (BytesIO, optional): Flag to stdout the generated TTS wav file for shell pipe. file_path (str, optional): Path to save the output file. When None it returns the `np.ndarray` of waveform. Defaults to None. Returns: Union[np.ndarray, str]: Waveform of the synthesized speech or path to the output file. /N)rZr[rPrSpipe_outrRr\r)rZr[rPrSrR)r splitr tts_to_filetts)r$rZr[rPrRrSr_r\r'r'r(tts_coqui_studios  zTTS.tts_coqui_studio)rZrOrPrQrRrSc Ksb|jf|||||d||jdur:|j|||||dS|jjf||||ddddd|}|S)u Convert text to speech. Args: text (str): Input text to synthesize. speaker (str, optional): Speaker name for multi-speaker. You can check whether loaded model is multi-speaker by `tts.is_multi_speaker` and list speakers by `tts.speakers`. Defaults to None. language (str): Language of the text. If None, the default language of the speaker is used. Language is only supported by `XTTS` model. speaker_wav (str, optional): Path to a reference wav file to use for voice cloning with supporting models like YourTTS. Defaults to None. emotion (str, optional): Emotion to use for 🐸Coqui Studio models. If None, Studio models use "Neutral". Defaults to None. speed (float, optional): Speed factor to use for 🐸Coqui Studio models, between 0 and 2.0. If None, Studio models use 1.0. Defaults to None. )rOrPrQrRrSN)rZr[rPrRrS)rZr[Z language_namerQZ reference_wavZ style_wavZ style_textZreference_speaker_name)rYrrcrrb) r$rZrOrPrQrRrSrXwavr'r'r(rb1s,     zTTS.tts output.wav)rZrOrPrQrRrSr\c Ksj|jf|||d| |jdur:|j|||||||dS|jf||||d| } |jj| ||d|S)uBConvert text to speech. Args: text (str): Input text to synthesize. speaker (str, optional): Speaker name for multi-speaker. You can check whether loaded model is multi-speaker by `tts.is_multi_speaker` and list speakers by `tts.speakers`. Defaults to None. language (str, optional): Language code for multi-lingual models. You can check whether loaded model is multi-lingual `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None. speaker_wav (str, optional): Path to a reference wav file to use for voice cloning with supporting models like YourTTS. Defaults to None. emotion (str, optional): Emotion to use for 🐸Coqui Studio models. Defaults to "Neutral". speed (float, optional): Speed factor to use for 🐸Coqui Studio models, between 0.0 and 2.0. Defaults to None. pipe_out (BytesIO, optional): Flag to stdout the generated TTS wav file for shell pipe. file_path (str, optional): Output file path. Defaults to "output.wav". kwargs (dict, optional): Additional arguments for the model. )rOrPrQN)rZr[rPrRrSr\r_)rZrOrPrQ)rdpathr_)rYrrcrbrr) r$rZrOrPrQrRrSr_r\rXrdr'r'r(rabs%  zTTS.tts_to_file source_wav target_wavcCs|jj||d}|S)zVoice conversion with FreeVC. Convert source wav to target speaker. Args:`` source_wav (str): Path to the source wav file. target_wav (str):` Path to the target wav file. rg)rvoice_conversion)r$rhrirdr'r'r(rjs zTTS.voice_conversion)rhrir\cCs(|j||d}t|||jjjjd|S)aTVoice conversion with FreeVC. Convert source wav to target speaker. Args: source_wav (str): Path to the source wav file. target_wav (str): Path to the target wav file. file_path (str, optional): Output file path. Defaults to "output.wav". rgrdrf sample_rate)rjrrrEaudiooutput_sample_rate)r$rhrir\rdr'r'r(voice_conversion_to_fileszTTS.voice_conversion_to_filerZrPrQcCsntjddd&}|j|d||j|dWdn1s:0Y|jdurX|d|jj|j|d}|S)aConvert text to speech with voice conversion. It combines tts with voice conversion to fake voice cloning. - Convert text to speech with tts. - Convert the output wav to target speaker with voice conversion. Args: text (str): Input text to synthesize. language (str, optional): Language code for multi-lingual models. You can check whether loaded model is multi-lingual `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None. speaker_wav (str, optional): Path to a reference wav file to use for voice cloning with supporting models like YourTTS. Defaults to None. z.wavF)suffixdeleteN)rZrOrPr\rQz2voice_conversion_models/multilingual/vctk/freevc24rg)tempfileNamedTemporaryFileranamerr"rj)r$rZrPrQfprdr'r'r( tts_with_vcs 4  zTTS.tts_with_vc)rZrPrQr\cCs*|j|||d}t|||jjjjddS)aConvert text to speech with voice conversion and save to file. Check `tts_with_vc` for more details. Args: text (str): Input text to synthesize. language (str, optional): Language code for multi-lingual models. You can check whether loaded model is multi-lingual `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None. speaker_wav (str, optional): Path to a reference wav file to use for voice cloning with supporting models like YourTTS. Defaults to None. file_path (str, optional): Output file path. Defaults to "output.wav". rprkN)rwrrrErmrn)r$rZrPrQr\rdr'r'r(tts_with_vc_to_fileszTTS.tts_with_vc_to_file)r NNNNTr F)F)F)NNF)NNNNN)NNNrVNN)NNNNN)NNNNrVNre)re)NN)NNre)"__name__ __module__ __qualname____doc__r6boolrpropertyr,r1r3r7r8r9 staticmethodrr@rDr"r!r#floatrYrnpndarrayrcrbrarjrorwrx __classcell__r'r'r%r(r sH          $  ( 0 4 7  r )rsrpathlibrtypingrnumpyrtorchrZ TTS.cs_apirZ TTS.utils.audio.numpy_transformsrZTTS.utils.managerZTTS.utils.synthesizerrModuler r'r'r'r(s