flan-context
This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.3807
- Rouge: {'rouge1': 0.23363126618471036, 'rouge2': 0.08044636729263657, 'rougeL': 0.19993208445605554, 'rougeLsum': 0.2006197564048095}
- Bleu: {'bleu': 0.028464166431939678, 'precisions': [0.35760233918128653, 0.11201248049921997, 0.053177257525083614, 0.02918918918918919], 'brevity_penalty': 0.32054926603854916, 'length_ratio': 0.46778826425933523, 'translation_length': 3420, 'reference_length': 7311}
- Bertscore Precision: 0.8820
- Bertscore Recall: 0.8608
- Bertscore F1: 0.8712
- Meteor: 0.1601
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 6
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge | Bleu | Bertscore Precision | Bertscore Recall | Bertscore F1 | Meteor |
---|---|---|---|---|---|---|---|---|---|
2.9042 | 0.9973 | 188 | 2.4777 | {'rouge1': 0.21030667522066182, 'rouge2': 0.07507453780009368, 'rougeL': 0.18106782455686604, 'rougeLsum': 0.1812910579859416} | {'bleu': 0.020546614226174383, 'precisions': [0.38292011019283745, 0.12011900334696914, 0.05739692805173808, 0.029216467463479414], 'brevity_penalty': 0.21924576067972068, 'length_ratio': 0.3972096840377513, 'translation_length': 2904, 'reference_length': 7311} | 0.8919 | 0.8583 | 0.8746 | 0.1485 |
2.5901 | 2.0 | 377 | 2.4190 | {'rouge1': 0.2300367112398293, 'rouge2': 0.07916991662104621, 'rougeL': 0.19576726074969686, 'rougeLsum': 0.19614355426725824} | {'bleu': 0.025952260633399154, 'precisions': [0.3812206572769953, 0.12080536912751678, 0.056419529837251355, 0.03019607843137255], 'brevity_penalty': 0.2757493685471216, 'length_ratio': 0.4370127205580632, 'translation_length': 3195, 'reference_length': 7311} | 0.8904 | 0.8618 | 0.8758 | 0.1596 |
2.4323 | 2.9973 | 565 | 2.3877 | {'rouge1': 0.2249008577752543, 'rouge2': 0.07434980814070552, 'rougeL': 0.19371686119466586, 'rougeLsum': 0.19396904482546817} | {'bleu': 0.025074382590179446, 'precisions': [0.35903614457831323, 0.11207729468599034, 0.048788927335640137, 0.024672897196261683], 'brevity_penalty': 0.3005598328831212, 'length_ratio': 0.45411024483654766, 'translation_length': 3320, 'reference_length': 7311} | 0.8848 | 0.8595 | 0.8719 | 0.1603 |
2.2934 | 4.0 | 754 | 2.3785 | {'rouge1': 0.23136472567858818, 'rouge2': 0.07625775070863702, 'rougeL': 0.19775377104228403, 'rougeLsum': 0.19791574466368636} | {'bleu': 0.024551483182166975, 'precisions': [0.3596233078281342, 0.10713163682060949, 0.04481132075471698, 0.021067925899019253], 'brevity_penalty': 0.31614328297564415, 'length_ratio': 0.464779099986322, 'translation_length': 3398, 'reference_length': 7311} | 0.8832 | 0.8599 | 0.8713 | 0.1591 |
2.2147 | 4.9973 | 942 | 2.3790 | {'rouge1': 0.23763851620529536, 'rouge2': 0.08141814036639361, 'rougeL': 0.2001568601892993, 'rougeLsum': 0.20096544591093737} | {'bleu': 0.028773005913963673, 'precisions': [0.3619631901840491, 0.11564837905236908, 0.05412629468760441, 0.02843772498200144], 'brevity_penalty': 0.3211503937094765, 'length_ratio': 0.46819860484201886, 'translation_length': 3423, 'reference_length': 7311} | 0.8848 | 0.8623 | 0.8733 | 0.1639 |
2.156 | 5.9841 | 1128 | 2.3807 | {'rouge1': 0.23363126618471036, 'rouge2': 0.08044636729263657, 'rougeL': 0.19993208445605554, 'rougeLsum': 0.2006197564048095} | {'bleu': 0.028464166431939678, 'precisions': [0.35760233918128653, 0.11201248049921997, 0.053177257525083614, 0.02918918918918919], 'brevity_penalty': 0.32054926603854916, 'length_ratio': 0.46778826425933523, 'translation_length': 3420, 'reference_length': 7311} | 0.8820 | 0.8608 | 0.8712 | 0.1601 |
Framework versions
- Transformers 4.46.3
- Pytorch 2.4.1+cu121
- Datasets 2.20.0
- Tokenizers 0.20.3
- Downloads last month
- 21
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for zera09/flan-context
Base model
google/flan-t5-base