Built with Axolotl

vicuna_7b_stage1_flash

This model is a fine-tuned version of lmsys/vicuna-7b-v1.5 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 40
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
3.0038 0.0410 40 nan
2.8811 0.0821 80 nan
2.7369 0.1231 120 nan
2.7568 0.1641 160 nan
2.9021 0.2051 200 nan
2.8989 0.2462 240 nan
2.8846 0.2872 280 nan
2.9066 0.3282 320 nan
2.7595 0.3692 360 nan
2.698 0.4103 400 nan
3.0317 0.4513 440 nan
2.7174 0.4923 480 nan
2.6518 0.5333 520 nan
2.6698 0.5744 560 nan
2.8094 0.6154 600 nan
2.721 0.6564 640 nan
2.679 0.6974 680 nan
2.5793 0.7385 720 nan
2.6417 0.7795 760 nan
2.5729 0.8205 800 nan
2.3099 0.8615 840 nan
2.6534 0.9026 880 nan
2.5593 0.9436 920 nan
2.4978 0.9846 960 nan
2.4152 1.0256 1000 nan
2.44 1.0667 1040 nan
2.5566 1.1077 1080 nan
2.336 1.1487 1120 nan
2.2365 1.1897 1160 nan
2.4516 1.2308 1200 nan
2.3756 1.2718 1240 nan
2.1703 1.3128 1280 nan
2.1208 1.3538 1320 nan
2.2553 1.3949 1360 nan
2.1051 1.4359 1400 nan
2.2217 1.4769 1440 nan
2.0144 1.5179 1480 nan
2.0047 1.5590 1520 nan
2.0336 1.6 1560 nan
1.9599 1.6410 1600 nan
2.0869 1.6821 1640 nan
1.9917 1.7231 1680 nan
2.0273 1.7641 1720 nan
1.7805 1.8051 1760 nan
1.9446 1.8462 1800 nan
1.9071 1.8872 1840 nan
1.9767 1.9282 1880 nan
1.7796 1.9692 1920 nan
1.9373 2.0103 1960 nan
1.6333 2.0513 2000 nan
1.7957 2.0923 2040 nan
1.6708 2.1333 2080 nan
1.7508 2.1744 2120 nan
1.6658 2.2154 2160 nan
1.5108 2.2564 2200 nan
1.6733 2.2974 2240 nan
1.6248 2.3385 2280 nan
1.5717 2.3795 2320 nan
1.6168 2.4205 2360 nan
1.6815 2.4615 2400 nan
1.6728 2.5026 2440 nan
1.6679 2.5436 2480 nan
1.6393 2.5846 2520 nan
1.5578 2.6256 2560 nan
1.5701 2.6667 2600 nan
1.7143 2.7077 2640 nan
1.5139 2.7487 2680 nan
1.7144 2.7897 2720 nan
1.6629 2.8308 2760 nan
1.5304 2.8718 2800 nan
1.5267 2.9128 2840 nan
1.5946 2.9538 2880 nan
1.6313 2.9949 2920 nan

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.6.0.dev20241122+rocm6.2
  • Datasets 2.14.7
  • Tokenizers 0.20.3
Downloads last month
34
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Yingbing/vicuna_7b_integration

Finetuned
(50)
this model