File size: 1,441 Bytes
5552a3c
b04062d
5552a3c
 
 
 
 
 
 
 
 
cb11aa5
 
 
 
 
 
5552a3c
 
 
 
cb11aa5
 
 
 
5552a3c
6f05572
5552a3c
cb11aa5
5552a3c
cb11aa5
 
 
5552a3c
cb11aa5
5552a3c
6628ef6
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- generated_from_trainer
metrics:
- wer
model-index:
- name: whisper-large-v3-ft-cv-cy-en
  results: []
datasets:
- techiaith/commonvoice_18_0_cy_en
language:
- cy
- en
pipeline_tag: automatic-speech-recognition
---

# whisper-large-v3-ft-cv-cy-en

This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the 
[techiaith/commonvoice_18_0_cy_en](https://huggingface.co/datasets/techiaith/commonvoice_18_0_cy_en) dataset. Both the 
English and Welsh data have been used to fine-tune the whisper model for transcribing both languages as well as improved
language detection. 

It achieves a success rate of **98.86% for language detection** on recordings from a [Common Voice bilingual test set](https://huggingface.co/datasets/techiaith/commonvoice_18_0_cy_en/viewer/default/test)

While, it achieves the following WER results for transcribing using the same test set:

- Welsh: 26.20
- English: 15.37
- Average: 20.70

N.B. the desired transcript language is not given to the fine-tuned model during testing.


## Usage

```python
from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="techiaith/whisper-large-v3-ft-cv-cy-en")
result = transcriber(<path or url to soundfile>)
print (result)
```

`{'text': 'Mae hen wlad fy nhadau yn annwyl i mi.'}`