File size: 6,090 Bytes
7bd18fd 0e4a298 7bd18fd 98cbdd7 7bd18fd 98cbdd7 7bd18fd 00801f2 98cbdd7 00801f2 0f6a41b 7bd18fd 0e4a298 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
---
license: llama3.1
datasets:
- agentlans/crash-course
base_model:
- DreadPoor/LemonP-8B-Model_Stock
- Youlln/1PARAMMYL-8B-ModelStock
- jaspionjader/f-2-8b
- Etherll/SuperHermes
- meta-llama/Llama-3.1-8B
tags:
- merge
- mergekit
model-index:
- name: Llama3.1-Daredevilish
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 62.92
name: averaged accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 29.2
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 12.76
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 6.82
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 11.6
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 29.96
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-Daredevilish
name: Open LLM Leaderboard
---
# Llama 3.1 Daredevilish
- This model is an experimental Llama 3.1-based merge, inspired by [mlabonne/Daredevil-8B](https://huggingface.co/mlabonne/Daredevil-8B).
- It combines the top-performing Llama 3.1 8B models on the MMLU-Pro task as of January 21, 2025.
## Model Details
- **Architecture:** Llama 3.1 (8.03B parameters)
- **Training:** Merged from top MMLU-Pro models, with additional supervised fine-tuning (SFT)
- **Release Date:** January 21, 2025
> [!IMPORTANT]
> The model fails to end replies properly when used with some system prompts.
> If this is a problem, consider using [agentlans/Llama3.1-Daredevilish-Instruct](https://huggingface.co/agentlans/Llama3.1-Daredevilish-Instruct) in instruct mode.
## Key Features
1. **Merged Architecture:** Combines high-performing MMLU-Pro models to enhance overall capabilities.
2. **Llama 3 Compatibility:** Additional Supervised Fine-Tuning (SFT) ensures adherence to Llama 3 prompt format.
3. **SFT Dataset:** [agentlans/crash-course](https://huggingface.co/datasets/agentlans/crash-course) dataset (1200 row configuration) for supervised fine-tuning in [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
4. **Fine-Tuning Approach:**
- 1 epoch training
- Rank 4 LoRA
- Alpha = 4
- rslora
## Merge Configuration
The model was created using [mergekit](https://github.com/arcee-ai/mergekit) with the following merge configuration:
```yaml
models:
- model: DreadPoor/LemonP-8B-Model_Stock
parameters:
density: 0.6
weight: 0.16
- model: Youlln/1PARAMMYL-8B-ModelStock
parameters:
density: 0.6
weight: 0.13
- model: jaspionjader/f-2-8b
parameters:
density: 0.6
weight: 0.10
- model: Etherll/SuperHermes
parameters:
density: 0.6
weight: 0.08
merge_method: dare_ties
base_model: meta-llama/Llama-3.1-8B
dtype: bfloat16
```
## Usage and Limitations
This experimental model is designed for research and development purposes. Users should be aware of potential biases and limitations inherent in language models. Always validate outputs and use the model responsibly.
## Future Work
Further evaluation and fine-tuning may be necessary to optimize performance across various tasks. Researchers are encouraged to build upon this experimental merge to advance the capabilities of Llama-based models.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/agentlans__Llama3.1-Daredevilish-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=agentlans%2FLlama3.1-Daredevilish&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
| Metric |Value (%)|
|-------------------|--------:|
|**Average** | 25.54|
|IFEval (0-Shot) | 62.92|
|BBH (3-Shot) | 29.20|
|MATH Lvl 5 (4-Shot)| 12.76|
|GPQA (0-shot) | 6.82|
|MuSR (0-shot) | 11.60|
|MMLU-PRO (5-shot) | 29.96|
|