techiaith
/

whisper-large-v3-ft-cv-cy-en

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics

whisper-large-v3-ft-cv-cy-en / README.md

DewiBrynJones's picture

Update README.md

6628ef6 verified 3 months ago

|

1.44 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: openai/whisper-large-v3
	tags:
	- generated_from_trainer
	metrics:
	- wer
	model-index:
	- name: whisper-large-v3-ft-cv-cy-en
	results: []
	datasets:
	- techiaith/commonvoice_18_0_cy_en
	language:
	- cy
	- en
	pipeline_tag: automatic-speech-recognition
	---

	# whisper-large-v3-ft-cv-cy-en

	This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the
	[techiaith/commonvoice_18_0_cy_en](https://huggingface.co/datasets/techiaith/commonvoice_18_0_cy_en) dataset. Both the
	English and Welsh data have been used to fine-tune the whisper model for transcribing both languages as well as improved
	language detection.

	It achieves a success rate of 98.86% for language detection on recordings from a [Common Voice bilingual test set](https://huggingface.co/datasets/techiaith/commonvoice_18_0_cy_en/viewer/default/test)

	While, it achieves the following WER results for transcribing using the same test set:

	- Welsh: 26.20
	- English: 15.37
	- Average: 20.70

	N.B. the desired transcript language is not given to the fine-tuned model during testing.


	## Usage

	```python
	from transformers import pipeline

	transcriber = pipeline("automatic-speech-recognition", model="techiaith/whisper-large-v3-ft-cv-cy-en")
	result = transcriber(<path or url to soundfile>)
	print (result)
	```

	`{'text': 'Mae hen wlad fy nhadau yn annwyl i mi.'}`