Rauhan commited on
Commit
2ddf57b
·
verified ·
1 Parent(s): e7fe446

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ pipeline_tag: text-generation
21
 
22
  # **LLaMA-3.2-3B-GRPO-GSM325**
23
 
24
- 🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-4o-style advanced reasoning**.
25
 
26
  🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing)
27
 
 
21
 
22
  # **LLaMA-3.2-3B-GRPO-GSM325**
23
 
24
+ 🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-4o1-style advanced reasoning**.
25
 
26
  🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing)
27