Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ pipeline_tag: text-generation
|
|
21 |
|
22 |
# **LLaMA-3.2-3B-GRPO-GSM325**
|
23 |
|
24 |
-
🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-
|
25 |
|
26 |
🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing)
|
27 |
|
|
|
21 |
|
22 |
# **LLaMA-3.2-3B-GRPO-GSM325**
|
23 |
|
24 |
+
🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-4o1-style advanced reasoning**.
|
25 |
|
26 |
🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing)
|
27 |
|