khongtrunght
/

Qwen2-7B-Instruct-SPPO-Function-call-v2.11

Generated from Trainer

Model card Files Files and versions Community

Qwen2-7B-Instruct-SPPO-Function-call-v2.11

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1457
Rewards/chosen: -1.7639
Rewards/rejected: -14.1509
Rewards/accuracies: 0.9364
Rewards/margins: 12.3871
Logps/rejected: -551.2230
Logps/chosen: -189.1563
Logits/rejected: -1.6081
Logits/chosen: -1.5770

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2001	0.1145	250	0.2192	0.7210	-1.8684	0.9162	2.5895	-305.5732	-139.4582	-1.6566	-1.7096
0.1246	0.2290	500	0.1662	0.6780	-4.7708	0.9277	5.4487	-363.6193	-140.3193	-1.6309	-1.6619
0.0831	0.3436	750	0.1441	0.5794	-6.0728	0.9191	6.6521	-389.6595	-142.2913	-1.6015	-1.6194
0.0698	0.4581	1000	0.1458	-0.1931	-8.1002	0.9335	7.9071	-430.2079	-157.7405	-1.6062	-1.6142
0.0872	0.5726	1250	0.1416	-0.0252	-8.5014	0.9393	8.4762	-438.2315	-154.3822	-1.5572	-1.5535
0.0547	0.6871	1500	0.1330	-0.4963	-9.4547	0.9335	8.9584	-457.2992	-163.8050	-1.5598	-1.5574
0.1092	0.8016	1750	0.1337	-1.2236	-10.3660	0.9277	9.1424	-475.5235	-178.3509	-1.5822	-1.5827
0.1109	0.9162	2000	0.1190	-0.4262	-9.6091	0.9364	9.1829	-460.3859	-162.4036	-1.5682	-1.5631
0.013	1.0307	2250	0.1355	-0.4415	-10.4543	0.9393	10.0128	-477.2908	-162.7087	-1.5520	-1.5425
0.0107	1.1452	2500	0.1450	-1.2114	-11.9528	0.9393	10.7414	-507.2599	-178.1073	-1.5666	-1.5494
0.0203	1.2597	2750	0.1424	-1.2291	-12.7381	0.9364	11.5090	-522.9661	-178.4617	-1.5798	-1.5536
0.0128	1.3743	3000	0.1428	-1.5064	-13.4244	0.9393	11.9180	-536.6923	-184.0067	-1.5982	-1.5679
0.0447	1.4888	3250	0.1490	-1.6333	-13.8914	0.9422	12.2581	-546.0324	-186.5450	-1.6084	-1.5768
0.0114	1.6033	3500	0.1508	-1.8097	-14.2168	0.9393	12.4071	-552.5399	-190.0730	-1.6144	-1.5842
0.0201	1.7178	3750	0.1447	-1.7474	-14.1355	0.9393	12.3881	-550.9136	-188.8267	-1.6087	-1.5784
0.0139	1.8323	4000	0.1461	-1.7396	-14.1065	0.9393	12.3669	-550.3343	-188.6715	-1.6088	-1.5783
0.0038	1.9469	4250	0.1457	-1.7639	-14.1509	0.9364	12.3871	-551.2230	-189.1563	-1.6081	-1.5770

Framework versions

Transformers 4.44.0
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Downloads last month: 4

Safetensors

Model size

7.62B params

Tensor type

BF16

·

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Evaluation results

Metadata error: specify a dataset to view leaderboard