a90f90f4-1d63-415a-b420-4f86a46daf28

This model is a fine-tuned version of databricks/dolly-v2-3b on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000204
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0000	1	0.2303
0.0378	0.0009	50	0.0198
0.0342	0.0019	100	0.0155
0.0354	0.0028	150	0.0139
0.0213	0.0038	200	0.0132
0.0228	0.0047	250	0.0117
0.0209	0.0057	300	0.0119
0.0302	0.0066	350	0.0133
0.0213	0.0076	400	0.0098
0.0205	0.0085	450	0.0096
0.0211	0.0095	500	0.0096