Qwen2-7B-Instruct-SPPO-Function-call-v2.11
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1457
- Rewards/chosen: -1.7639
- Rewards/rejected: -14.1509
- Rewards/accuracies: 0.9364
- Rewards/margins: 12.3871
- Logps/rejected: -551.2230
- Logps/chosen: -189.1563
- Logits/rejected: -1.6081
- Logits/chosen: -1.5770
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.2001 | 0.1145 | 250 | 0.2192 | 0.7210 | -1.8684 | 0.9162 | 2.5895 | -305.5732 | -139.4582 | -1.6566 | -1.7096 |
0.1246 | 0.2290 | 500 | 0.1662 | 0.6780 | -4.7708 | 0.9277 | 5.4487 | -363.6193 | -140.3193 | -1.6309 | -1.6619 |
0.0831 | 0.3436 | 750 | 0.1441 | 0.5794 | -6.0728 | 0.9191 | 6.6521 | -389.6595 | -142.2913 | -1.6015 | -1.6194 |
0.0698 | 0.4581 | 1000 | 0.1458 | -0.1931 | -8.1002 | 0.9335 | 7.9071 | -430.2079 | -157.7405 | -1.6062 | -1.6142 |
0.0872 | 0.5726 | 1250 | 0.1416 | -0.0252 | -8.5014 | 0.9393 | 8.4762 | -438.2315 | -154.3822 | -1.5572 | -1.5535 |
0.0547 | 0.6871 | 1500 | 0.1330 | -0.4963 | -9.4547 | 0.9335 | 8.9584 | -457.2992 | -163.8050 | -1.5598 | -1.5574 |
0.1092 | 0.8016 | 1750 | 0.1337 | -1.2236 | -10.3660 | 0.9277 | 9.1424 | -475.5235 | -178.3509 | -1.5822 | -1.5827 |
0.1109 | 0.9162 | 2000 | 0.1190 | -0.4262 | -9.6091 | 0.9364 | 9.1829 | -460.3859 | -162.4036 | -1.5682 | -1.5631 |
0.013 | 1.0307 | 2250 | 0.1355 | -0.4415 | -10.4543 | 0.9393 | 10.0128 | -477.2908 | -162.7087 | -1.5520 | -1.5425 |
0.0107 | 1.1452 | 2500 | 0.1450 | -1.2114 | -11.9528 | 0.9393 | 10.7414 | -507.2599 | -178.1073 | -1.5666 | -1.5494 |
0.0203 | 1.2597 | 2750 | 0.1424 | -1.2291 | -12.7381 | 0.9364 | 11.5090 | -522.9661 | -178.4617 | -1.5798 | -1.5536 |
0.0128 | 1.3743 | 3000 | 0.1428 | -1.5064 | -13.4244 | 0.9393 | 11.9180 | -536.6923 | -184.0067 | -1.5982 | -1.5679 |
0.0447 | 1.4888 | 3250 | 0.1490 | -1.6333 | -13.8914 | 0.9422 | 12.2581 | -546.0324 | -186.5450 | -1.6084 | -1.5768 |
0.0114 | 1.6033 | 3500 | 0.1508 | -1.8097 | -14.2168 | 0.9393 | 12.4071 | -552.5399 | -190.0730 | -1.6144 | -1.5842 |
0.0201 | 1.7178 | 3750 | 0.1447 | -1.7474 | -14.1355 | 0.9393 | 12.3881 | -550.9136 | -188.8267 | -1.6087 | -1.5784 |
0.0139 | 1.8323 | 4000 | 0.1461 | -1.7396 | -14.1065 | 0.9393 | 12.3669 | -550.3343 | -188.6715 | -1.6088 | -1.5783 |
0.0038 | 1.9469 | 4250 | 0.1457 | -1.7639 | -14.1509 | 0.9364 | 12.3871 | -551.2230 | -189.1563 | -1.6081 | -1.5770 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.