File size: 3,541 Bytes
97bbbfe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: mit
---


# whisper-large-v2-cantonese model for CTranslate2

This repository contains the conversion of [BELLE-2/Belle-whisper-large-v3-zh-punct](https://huggingface.co/BELLE-2/Belle-whisper-large-v3-zh-punct) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.

This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).

## Example
Installation
```python
pip install faster-whisper
```
Usage
```python
from faster_whisper import WhisperModel
import datetime
import os

#Confirmed that this code works in faster-whisper 1.02 , numpy 1.23.5 , onnxruntime 1.14.1
#This code will not work if numpy's version exceed 2.0.0 and vad_filter=True

def transcribe_audio(input_file, output_file):
    model_size = "XA9/Belle-faster-whisper-large-v3-zh-punct"
    model = WhisperModel(model_size, device="cpu", compute_type="default")
    segments, info = model.transcribe(input_file, word_timestamps=True, initial_prompt = None,
        beam_size=5, language="zh", max_new_tokens=128, condition_on_previous_text=False,
        vad_filter=False, vad_parameters=dict(min_silence_duration_ms=500))
        
    sub_list = [] 
    srt_content = ""
    srt_number = 0
    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
        start_time_str = format_to_srt(segment.start)
        end_time_str = format_to_srt(segment.end)
        sub_text = replace_special_chars(segment.text)
        sub_entry = f"{start_time_str} --> {end_time_str}\n{sub_text}\n\n"
        sub_list.append(sub_entry)  # Add formatted subtitles to list
    
    for sub in sub_list: # Add subtitle's index number
        srt_content = srt_content + str(srt_number) + "\n" + sub
        srt_number = srt_number + 1
        
    with open(output_file, 'w', encoding="utf-8") as srt_file:
        srt_file.write(srt_content)
    
    print("")
    print("Saved: " + os.path.abspath(output_file))

def replace_special_chars(text): # remove space and "! " if the first letter is space or "! "
    # Check if the text starts with "!" or " " and ends with " "
    if (text.startswith("! ") or text.startswith(" ")):
        # Replace the special characters with an empty string
        text = text.replace("!", "").replace(" ", "", 1)  # Only replace the first occurrence
    return text


def format_to_srt(seconds): #Convert seconds to SRT's timecode
    dt = datetime.datetime(1, 1, 1) + datetime.timedelta(seconds=seconds)
    formatted_time = "{:02d}:{:02d}:{:02d},{:03d}".format(dt.hour, dt.minute, dt.second, dt.microsecond//1000)
    return formatted_time


transcribe_audio("audio.mp3", "audio.srt")

```
## Example(transcribe with stable-ts)
Installation

Requires FFmpeg in PATH
```python
pip install faster-whisper
pip install stable-ts
```

Usage
```python
import stable_whisper

model = stable_whisper.load_faster_whisper('XA9/Belle-faster-whisper-large-v3-zh-punct', device='cpu', compute_type='default')
result = model.transcribe_stable('audio.mp3', language='zh', initial_prompt=None,regroup=False, vad=False, condition_on_previous_text=False)
result.to_srt_vtt('audio.srt', word_level=False)
```

## Conversion details

The original model was converted with the following command:

```
ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v3-zh-punct --output_dir Belle-faster-whisper-large-v3-zh-punct --copy_files tokenizer.json preprocessor_config.json --quantization float16
```