mistral-7b-distilabel-truthy-dpo

mistral-7b-distilabel-truthy-dpo is a DPO fine-tuned version of mistralai/Mistral-7B-v0.1 using the mlabonne/distilabel-truthy-dpo-v0.1 dataset.

LoRA

  • r: 16
  • LoRA alpha: 16
  • LoRA dropout: 0.05

Training arguments

  • Batch size: 4
  • Gradient accumulation steps: 4
  • Optimizer: paged_adamw_32bit
  • Max steps: 100
  • Learning rate: 5e-05
  • Learning rate scheduler type: cosine
  • Beta: 0.1
  • Max prompt length: 1024
  • Max length: 1536
Downloads last month
73
Safetensors
Model size
7.24B params
Tensor type
FP16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for CorticalStack/mistral-7b-distilabel-truthy-dpo

Finetuned
(832)
this model
Quantizations
3 models

Spaces using CorticalStack/mistral-7b-distilabel-truthy-dpo 6