led-base-16384-lfqa

This model is a fine-tuned version of stefanbschneider/led-base-16384-lfqa-ans-len-512 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2615
  • Rouge2: 0.0416
  • Task: {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rouge2 Task
3.4849 0.0395 2000 3.4233 0.0387 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4744 0.0789 4000 3.4411 0.0398 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4919 0.1184 6000 3.4251 0.0378 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.487 0.1578 8000 3.4200 0.0397 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4443 0.1973 10000 3.3870 0.0376 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4597 0.2367 12000 3.3914 0.0405 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4525 0.2762 14000 3.3845 0.0398 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4618 0.3156 16000 3.3752 0.0424 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4573 0.3551 18000 3.3693 0.0421 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4164 0.3945 20000 3.3640 0.042 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.4125 0.4340 22000 3.3544 0.0412 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3828 0.4734 24000 3.3423 0.0409 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3965 0.5129 26000 3.3436 0.0416 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3993 0.5524 28000 3.3339 0.0384 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3909 0.5918 30000 3.3122 0.0414 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3745 0.6313 32000 3.3158 0.0416 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3665 0.6707 34000 3.3038 0.0424 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3351 0.7102 36000 3.2915 0.0435 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3629 0.7496 38000 3.2955 0.0436 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3465 0.7891 40000 3.2888 0.0395 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3127 0.8285 42000 3.2800 0.0414 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3385 0.8680 44000 3.2767 0.0413 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.2882 0.9074 46000 3.2685 0.0437 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3162 0.9469 48000 3.2639 0.0412 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
3.3072 0.9863 50000 3.2615 0.0416 {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
162M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for stefanbschneider/led-base-16384-lfqa