بسم اله الرحمن الرحیم - هست کلید در گنج حکیم

Model Card for Khadijah(SA)

This is the first persian/english text-to-speech model using the brand new matcha TTS model.

Much faster and better than VITS.

Works best with the UNIVERSAL_V1_22050Hz hifigan vocoder.

Enjoy!

Usage with the Sherpa-onnx repo

Remember to add metadata to onnx file as in: https://github.com/k2-fsa/icefall/blob/master/egs/ljspeech/TTS/matcha/export_onnx.py#L174

Usage with the Matcha-TTS repo

In matcha/text/cleaners.py, phonemizer.backend.EspeakBackend part:

    language="fa",

pip install piper-phonemize
In cleaners.py:

add below persian_cleaners_piper:

import piper_phonemize
def persian_cleaners_piper(text):
    """Pipeline for Persian text, including abbreviation expansion. + punctuation + stress"""
    #text = convert_to_ascii(text)
    text = lowercase(text)
    text = expand_abbreviations(text)
    phonemes = "".join(piper_phonemize.phonemize_espeak(text=text, voice="fa")[0])
    phonemes = collapse_whitespace(phonemes)
    
    # Remove unwanted symbols (e.g., '1')
    unwanted_symbols = {'1', '-'}  # Add any other unwanted symbols here
    filtered_phonemes = "".join([char for char in phonemes if char not in unwanted_symbols])
    
    return filtered_phonemes

In matcha/text/cleaners.py change this line to:

    intersperse(text_to_sequence(text, ["persian_cleaners_piper"])[0], 0),

Also set cleaner in configs/data/custom.yaml: cleaners: [persian_cleaners_piper]
replace symbols.py by:

def read_tokens():
    tokens = []
    with open("/home/oem/Basir/TTS/Matcha/Matcha-TTS/configs/tokens/tokens_sherpa_with_fa.txt", "r", encoding="utf-8") as f:
        for line in f:
            # Remove the newline character at the end
            line = line.rstrip("\n")
            # Split into token and number, preserving whitespace
            if " " in line:
                token = line[:line.index(" ")]  # Extract everything before the first space
                if len(token) == 0: # White-space
                    token = ' '
            else:
                token = line  # If there's no space, the entire line is the token
            tokens.append(token)
    return tokens

symbols = read_tokens()

For possible errors, change save_figure_to_numpy to:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import io

def save_figure_to_numpy(fig):
    buf = io.BytesIO()
    fig.savefig(buf, format='png', bbox_inches='tight', pad_inches=0)
    buf.seek(0)
    img = Image.open(buf)
    data = np.array(img)
    buf.close()
    
    return data

Training results

Credits

Trained by Ali Mahmoudi (@mah92)

Special thanks to Masoud Azizi (@Mablue ), Amirreza Ramezani (@brightening-eyes ), and Dr. Hamid Jafari (Khaneh Noor Iranian Basir).

Special thanks to people from @ttsfarsi channel.

I should also thank you @csukuangfj from Xiaomi corporation for your helps and cares in icefall and sherpa-onnx repos.

و ما نحن بشئ الا بما رحم ربنا

mah92
/

Khadijah-FA_EN-Matcha-TTS-Model