metadata

license: other
base_model: meta-llama/Meta-Llama-3-8B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: C018_random_sample_llama3-8b-base_pretrain_20240504_182259
    results: []

C018_random_sample_llama3-8b-base_pretrain_20240504_182259

This model is a fine-tuned version of /data/pro-align/progressalign/shared_storage/downloaded_models/llama3-8b-base on the C018_random_sample_data dataset. It achieves the following results on the evaluation set:

Loss: 2.2706

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_steps: 20
num_epochs: 4.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
2.3701	0.2186	200	2.3702
2.3183	0.4372	400	2.3160
2.2634	0.6557	600	2.2863
2.2522	0.8743	800	2.2706
2.0306	1.0929	1000	2.2777
2.0095	1.3115	1200	2.2760
2.0539	1.5301	1400	2.2746
2.0338	1.7486	1600	2.2743
2.0648	1.9672	1800	2.2737
2.0297	2.1858	2000	2.2766
2.0487	2.4044	2200	2.2767
2.0329	2.6230	2400	2.2770
2.0213	2.8415	2600	2.2766
2.0559	3.0601	2800	2.2771
2.0543	3.2787	3000	2.2773
2.0317	3.4973	3200	2.2772
1.988	3.7158	3400	2.2770
2.0355	3.9344	3600	2.2772

Framework versions

Transformers 4.40.1
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.19.1