flan-context

This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3807
  • Rouge: {'rouge1': 0.23363126618471036, 'rouge2': 0.08044636729263657, 'rougeL': 0.19993208445605554, 'rougeLsum': 0.2006197564048095}
  • Bleu: {'bleu': 0.028464166431939678, 'precisions': [0.35760233918128653, 0.11201248049921997, 0.053177257525083614, 0.02918918918918919], 'brevity_penalty': 0.32054926603854916, 'length_ratio': 0.46778826425933523, 'translation_length': 3420, 'reference_length': 7311}
  • Bertscore Precision: 0.8820
  • Bertscore Recall: 0.8608
  • Bertscore F1: 0.8712
  • Meteor: 0.1601

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Rouge Bleu Bertscore Precision Bertscore Recall Bertscore F1 Meteor
2.9042 0.9973 188 2.4777 {'rouge1': 0.21030667522066182, 'rouge2': 0.07507453780009368, 'rougeL': 0.18106782455686604, 'rougeLsum': 0.1812910579859416} {'bleu': 0.020546614226174383, 'precisions': [0.38292011019283745, 0.12011900334696914, 0.05739692805173808, 0.029216467463479414], 'brevity_penalty': 0.21924576067972068, 'length_ratio': 0.3972096840377513, 'translation_length': 2904, 'reference_length': 7311} 0.8919 0.8583 0.8746 0.1485
2.5901 2.0 377 2.4190 {'rouge1': 0.2300367112398293, 'rouge2': 0.07916991662104621, 'rougeL': 0.19576726074969686, 'rougeLsum': 0.19614355426725824} {'bleu': 0.025952260633399154, 'precisions': [0.3812206572769953, 0.12080536912751678, 0.056419529837251355, 0.03019607843137255], 'brevity_penalty': 0.2757493685471216, 'length_ratio': 0.4370127205580632, 'translation_length': 3195, 'reference_length': 7311} 0.8904 0.8618 0.8758 0.1596
2.4323 2.9973 565 2.3877 {'rouge1': 0.2249008577752543, 'rouge2': 0.07434980814070552, 'rougeL': 0.19371686119466586, 'rougeLsum': 0.19396904482546817} {'bleu': 0.025074382590179446, 'precisions': [0.35903614457831323, 0.11207729468599034, 0.048788927335640137, 0.024672897196261683], 'brevity_penalty': 0.3005598328831212, 'length_ratio': 0.45411024483654766, 'translation_length': 3320, 'reference_length': 7311} 0.8848 0.8595 0.8719 0.1603
2.2934 4.0 754 2.3785 {'rouge1': 0.23136472567858818, 'rouge2': 0.07625775070863702, 'rougeL': 0.19775377104228403, 'rougeLsum': 0.19791574466368636} {'bleu': 0.024551483182166975, 'precisions': [0.3596233078281342, 0.10713163682060949, 0.04481132075471698, 0.021067925899019253], 'brevity_penalty': 0.31614328297564415, 'length_ratio': 0.464779099986322, 'translation_length': 3398, 'reference_length': 7311} 0.8832 0.8599 0.8713 0.1591
2.2147 4.9973 942 2.3790 {'rouge1': 0.23763851620529536, 'rouge2': 0.08141814036639361, 'rougeL': 0.2001568601892993, 'rougeLsum': 0.20096544591093737} {'bleu': 0.028773005913963673, 'precisions': [0.3619631901840491, 0.11564837905236908, 0.05412629468760441, 0.02843772498200144], 'brevity_penalty': 0.3211503937094765, 'length_ratio': 0.46819860484201886, 'translation_length': 3423, 'reference_length': 7311} 0.8848 0.8623 0.8733 0.1639
2.156 5.9841 1128 2.3807 {'rouge1': 0.23363126618471036, 'rouge2': 0.08044636729263657, 'rougeL': 0.19993208445605554, 'rougeLsum': 0.2006197564048095} {'bleu': 0.028464166431939678, 'precisions': [0.35760233918128653, 0.11201248049921997, 0.053177257525083614, 0.02918918918918919], 'brevity_penalty': 0.32054926603854916, 'length_ratio': 0.46778826425933523, 'translation_length': 3420, 'reference_length': 7311} 0.8820 0.8608 0.8712 0.1601

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.4.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.20.3
Downloads last month
21
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for zera09/flan-context

Finetuned
(673)
this model