deepseek_finetuned

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
3.926	0.1429	100	1.3948
0.6639	0.2857	200	0.4467
0.4304	0.4286	300	0.4228
0.4158	0.5714	400	0.4118
0.4046	0.7143	500	0.4031
0.3968	0.8571	600	0.3952
0.3925	1.0	700	0.3888
0.3864	1.1429	800	0.3834
0.3781	1.2857	900	0.3785
0.3759	1.4286	1000	0.3743
0.3696	1.5714	1100	0.3708
0.3679	1.7143	1200	0.3675
0.3664	1.8571	1300	0.3647
0.3637	2.0	1400	0.3626
0.3607	2.1429	1500	0.3607
0.3573	2.2857	1600	0.3592
0.3607	2.4286	1700	0.3580
0.3561	2.5714	1800	0.3571
0.357	2.7143	1900	0.3564
0.354	2.8571	2000	0.3561
0.3548	3.0	2100	0.3560