NeuralReyna-Mini-1.8B-v0.2
Description
Taken aloobun/Reyna-Mini-1.8B-v0.2 and further fine-tuned it using DPO using the Intel/orca_dpo_pairs dataset.
This model has capabilities in coding, math, science, roleplay, and function calling.
This model was trained on OpenAI's ChatML prompt format.
Evaluation
GPT4ALL:
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 0 | acc | 0.3208 | ± | 0.0136 |
none | 0 | acc_norm | 0.3336 | ± | 0.0138 | ||
arc_easy | 1 | none | 0 | acc | 0.6035 | ± | 0.0100 |
none | 0 | acc_norm | 0.5833 | ± | 0.0101 | ||
boolq | 2 | none | 0 | acc | 0.6526 | ± | 0.0083 |
hellaswag | 1 | none | 0 | acc | 0.4556 | ± | 0.0050 |
none | 0 | acc_norm | 0.6076 | ± | 0.0049 | ||
openbookqa | 1 | none | 0 | acc | 0.2600 | ± | 0.0196 |
none | 0 | acc_norm | 0.3460 | ± | 0.0213 | ||
piqa | 1 | none | 0 | acc | 0.7236 | ± | 0.0104 |
none | 0 | acc_norm | 0.7307 | ± | 0.0104 | ||
winogrande | 1 | none | 0 | acc | 0.6062 | ± | 0.0137 |
Disclaimer
This model may have overfitted to the DPO training data, and may not perform well.
Contributions
Thanks to @aloobun and @Locutusque for their contributions to this model.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 44.85 |
AI2 Reasoning Challenge (25-Shot) | 37.80 |
HellaSwag (10-Shot) | 60.51 |
MMLU (5-Shot) | 45.04 |
TruthfulQA (0-shot) | 37.75 |
Winogrande (5-shot) | 60.93 |
GSM8k (5-shot) | 27.07 |
- Downloads last month
- 197
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for M4-ai/NeuralReyna-Mini-1.8B-v0.2
Datasets used to train M4-ai/NeuralReyna-Mini-1.8B-v0.2
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard37.800
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard60.510
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard45.040
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard37.750
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard60.930
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard27.070