Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
metrics:
|
3 |
+
- wer
|
4 |
+
- cer
|
5 |
+
library_name: transformers
|
6 |
+
pipeline_tag: automatic-speech-recognition
|
7 |
+
tags:
|
8 |
+
- Cretan
|
9 |
+
- Greek dialect
|
10 |
+
---
|
11 |
+
|
12 |
+
# Cretan XLS-R model
|
13 |
+
|
14 |
+
Cretan is a variety of Modern Greek predominantly used by speakers who reside on the island of Crete or
|
15 |
+
belong to the Cretan diaspora. This includes communities of Cretan origin that were relocated to the
|
16 |
+
village of Hamidieh in Syria and to Western Asia Minor, following the population exchange between
|
17 |
+
Greece and Turkey in 1923. The historical and geographical factors that have shaped the development
|
18 |
+
and preservation of the dialect include the long-term isolation of Crete from the mainland, and the
|
19 |
+
successive domination of the island by foreign powers, such as the Arabs, the Venetians, and the Turks,
|
20 |
+
over a period of seven centuries. Cretan has been divided based on its phonological, phonetic,
|
21 |
+
morphological, and lexical characteristics into two major dialect groups: the western and the eastern.
|
22 |
+
The boundary between these groups coincides with the administrative division of the island into the
|
23 |
+
prefectures of Rethymno and Heraklion. Kontosopoulos (2008) argues that the eastern dialect group is more
|
24 |
+
homogeneous than the western one, which shows more variation across all levels of linguistic analysis.
|
25 |
+
Contrary to other Modern Greek Dialects, Cretan does not face the threat of extinction, as it remains
|
26 |
+
the sole means of communication for a large number of speakers in various parts of the island.
|
27 |
+
|
28 |
+
This is the first automatic speech recognition (ASR) model for Cretan.
|
29 |
+
To train the model, we fine-tuned a Greek XLS-R model ([jonatasgrosman/wav2vec2-large-xlsr-53-greek](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-greek)) on 11h of recorded Pomak speech.
|
30 |
+
|
31 |
+
## Resources
|
32 |
+
|
33 |
+
For the compilation of the Cretan corpus, we gathered 32 tapes containing material from
|
34 |
+
radio broadcasts in digital format, with permission from the Audiovisual Department of the
|
35 |
+
Vikelaia Municipal Library of Heraklion, Crete. These broadcasts were recorded and
|
36 |
+
aired by Radio Mires, in the Messara region of Heraklion, during the period 1998-2001,
|
37 |
+
totaling 958 minutes and 47 seconds. These recordings primarily consist of narratives
|
38 |
+
by one speaker, Ioannis Anagnostakis, who is responsible for their composition. In terms
|
39 |
+
of textual genre, the linguistic content of the broadcasts consists of folklore
|
40 |
+
narratives expressed in the local linguistic variety. Out of the total volume of material
|
41 |
+
collected, we utilized nine tapes. Criteria for material selection included, on the one hand,
|
42 |
+
maximizing digital clarity of speech and, on the other hand, ensuring representative sampling
|
43 |
+
across the entire three-year period of radio recordings. To obtain an initial transcription,
|
44 |
+
we employed the Large-v2 model, which was the largest Whisper model at the time. Subsequently,
|
45 |
+
the transcripts were manually corrected in collaboration with the local community.
|
46 |
+
The transcription system that was used was based on the Greek alphabet and orthography
|
47 |
+
and it was annotated in Praat.
|
48 |
+
|
49 |
+
To prepare the dataset, the texts were normalized (see [greek_dialects_asr/](https://gitlab.com/ilsp-spmd-all/speech/greek_dialects_asr/) for scripts),
|
50 |
+
and all audio files were converted into a 16 kHz mono format.
|
51 |
+
|
52 |
+
We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 1h 21m 12s.
|
53 |
+
Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration
|
54 |
+
(compared to the initial 2h recordings of the 9 tapes).
|
55 |
+
|
56 |
+
## Metrics
|
57 |
+
|
58 |
+
We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
|
59 |
+
|
60 |
+
|Model|WER|CER|
|
61 |
+
|---|---|---|
|
62 |
+
|pre-trained|104.83%|91.73%|
|
63 |
+
|fine-tuned|28.27%|7.88%|
|
64 |
+
|
65 |
+
## Training hyperparameters
|
66 |
+
|
67 |
+
We fine-tuned the baseline model (`wav2vec2-large-xlsr-53-greek`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
|
68 |
+
|
69 |
+
| arg | value |
|
70 |
+
|-------------------------------|-------|
|
71 |
+
| `per_device_train_batch_size` | 8 |
|
72 |
+
| `gradient_accumulation_steps` | 2 |
|
73 |
+
| `num_train_epochs` | 35 |
|
74 |
+
| `learning_rate` | 3e-4 |
|
75 |
+
| `warmup_steps` | 500 |
|
76 |
+
|
77 |
+
## Citation
|
78 |
+
|
79 |
+
To cite this work or read more about the training pipeline, see:
|
80 |
+
|
81 |
+
S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024.
|