Safetensors
llama

image/png

This model sponsored by the generous support of Cherry Republic.

https://www.cherryrepublic.com/

Model Overview

TinyLlama-R1 is a lightweight transformer model designed to handle instruction-following and reasoning tasks, particularly in STEM domains. This model was trained using the Magpie Reasoning V2 250K-CoT dataset, with a goal to improve reasoning through high-quality instruction-response pairs. However, based on early tests, TinyLlama-R1 shows reduced responsiveness to system-level instructions, likely due to the absence of system messages in the dataset.

Model Name: Josephgflowers/Tinyllama-STEM-Cinder-Agent-v1


Key Features

  • Dataset Focus: Built on the Magpie Reasoning V2 250K-CoT dataset, enhancing problem-solving in reasoning-heavy tasks.
  • STEM Application: Tailored for tasks involving scientific, mathematical, and logical reasoning.
  • Instruction Handling: Initial observations indicate reduced adherence to system instructions, a change from previous versions.

Model Details

  • Model Type: Transformer-based (TinyLlama architecture)
  • Parameter Count: 1.1B
  • Context Length: Updated to 8k
  • Training Framework: Unsloth
  • Primary Use Cases:
    • Inteded for research into COT in small language models
    • Technical problem-solving
    • Instruction-following conversations

Training Data

The model was fine-tuned on the Magpie Reasoning V2 250K-CoT dataset. The dataset includes diverse instruction-response pairs, but notably lacks system-level messages, which has impacted the model's ability to consistently follow system directives.

Dataset Characteristics

  • Sources:
    Instructions were generated using models like Meta's Llama 3.1 and 3.3.
    Responses were provided by DeepSeek-R1-Distill-Llama-70B.
  • Structure: Instruction-response pairs with an emphasis on chain-of-thought (CoT) reasoning styles.
  • Limitations: No system-level instructions were included, affecting instruction prioritization and response formatting in some contexts.

Known Issues & Limitations

  • System Instructions: The model currently does not respond well to system messages, in contrast to previous versions.
  • Performance Unverified: This version has not yet been formally tested on benchmarks like GSM-8K.

The model can be accessed and fine-tuned via Josephgflowers on Hugging Face.

Training & License Information

License: CC BY-NC 4.0 (Non-commercial use only) This model was trained using datasets under:

Meta Llama 3.1 and 3.3 Community License
CC BY-NC 4.0 (Creative Commons Non-Commercial License)

Acknowledgments

Thanks to the Magpie Reasoning V2 dataset creators and the researchers behind models like Deepseek-R1 and Meta Llama.

@article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Downloads last month
62
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for Josephgflowers/Tinyllama-r1

Dataset used to train Josephgflowers/Tinyllama-r1