Llama3-8B-SuperNova-Spectrum-Hermes-DPO
This model is a DPO fine-tuned version of my DARE_TIES
merged Model yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties
on the yuvraj17/chatml-OpenHermes2.5-dpo-binarized-alpha-2k dataset.
DPO (Direct Preference Optimization):
Direct Preference Optimization (DPO) is a fine-tuning technique that focuses on aligning a model's responses with human preferences or ranking data without requiring reinforcement learning steps, like in RLHF.
![](https://cdn-uploads.huggingface.co/production/uploads/66137d95e8d2cda230ddcea6/kHcU5dkcSVqxEIWt_GRUB.png)
Training:
- Trained on 1x A40s (48GB VRAM) using the HuggingFace TRL.
- QLoRA(
4-bit precision
) for 1 epoch# LoRA configuration peft_config = LoraConfig( r=32, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj'] )
Training Params
The following hyperparameters were used during training:
- learning_rate: 5e-05
- beta=0.1
- num_devices: 1
- gradient_accumulation_steps: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 1
Training Time = 1:57:00 hours
Weight & Biases Report
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
π Evaluation Scores
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 18.00 |
IFEval (0-Shot) | 46.91 |
BBH (3-Shot) | 21.24 |
MATH Lvl 5 (4-Shot) | 5.14 |
GPQA (0-shot) | 6.94 |
MuSR (0-shot) | 9.62 |
MMLU-PRO (5-shot) | 18.16 |
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for yuvraj17/Llama3-8B-SuperNova-Spectrum-Hermes-DPO
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard46.910
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard21.240
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard5.140
- acc_norm on GPQA (0-shot)Open LLM Leaderboard6.940
- acc_norm on MuSR (0-shot)Open LLM Leaderboard9.620
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard18.160