all-phonetic-wav2vec2-large-xls-r-300m

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the reazonspeech dataset. It achieves the following results on the evaluation set:

  • Loss: 4.1936
  • Wer: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 512
  • total_eval_batch_size: 256
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
No log 0.5882 5 56.7771 1.0
No log 1.1176 10 55.8627 1.0
No log 1.7059 15 56.9633 1.0
No log 2.2353 20 54.3381 1.0
No log 2.8235 25 53.8660 1.0
No log 3.3529 30 50.8835 1.0
No log 3.9412 35 44.9207 1.0
No log 4.4706 40 36.2043 1.0
No log 5.0 45 29.7698 1.0
No log 5.5882 50 24.9792 1.0
No log 6.1176 55 21.1911 1.0
No log 6.7059 60 19.4434 1.0
No log 7.2353 65 17.2106 1.0
No log 7.8235 70 15.7530 1.0
No log 8.3529 75 14.7387 1.0
No log 8.9412 80 14.1579 1.0
No log 9.4706 85 13.0264 1.0
No log 10.0 90 12.7336 1.0
No log 10.5882 95 11.6563 1.0
No log 11.1176 100 11.3245 1.0
No log 11.7059 105 10.4289 1.0
No log 12.2353 110 10.0300 1.0
No log 12.8235 115 9.2873 1.0
No log 13.3529 120 8.8417 1.0
No log 13.9412 125 8.2069 1.0
No log 14.4706 130 7.7223 1.0
No log 15.0 135 7.1727 1.0
No log 15.5882 140 6.6304 1.0
No log 16.1176 145 6.2982 1.0
No log 16.7059 150 5.9262 1.0
No log 17.2353 155 5.5123 1.0
No log 17.8235 160 5.2593 1.0
No log 18.3529 165 5.0062 1.0
No log 18.9412 170 4.8383 1.0
No log 19.4706 175 4.6796 1.0
No log 20.0 180 4.5745 1.0
No log 20.5882 185 4.4670 1.0
No log 21.1176 190 4.4222 1.0
No log 21.7059 195 4.3650 1.0
No log 22.2353 200 4.3156 1.0
No log 22.8235 205 4.2790 1.0
No log 23.3529 210 4.2899 1.0
No log 23.9412 215 4.3378 1.0
No log 24.4706 220 4.2416 1.0
No log 25.0 225 4.1972 1.0
No log 25.5882 230 4.2235 1.0
No log 26.1176 235 4.1859 1.0
No log 26.7059 240 4.1936 1.0

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
14
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ryos17/all-phonetic-wav2vec2-large-xls-r-300m

Finetuned
(544)
this model

Evaluation results