@s-emanuilov on Hugging Face: "Tutorial 💥 Training a non-English reasoning model with GRPO and Unsloth I…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

s-emanuilov

posted an update 2 days ago

Post

4330

Tutorial 💥 Training a non-English reasoning model with GRPO and Unsloth

I wanted to share my experiment with training reasoning models in languages other than English/Chinese.

Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage.

Full code and tutorial here: https://unfoldai.com/reasoning-in-a-non-english-language/

The model itself: s-emanuilov/LLMBG-Llama-3.1-8B-BG-Reasoning-v0.1

I hope this helps anyone looking to build reasoning models in their language.

dbands

2 days ago

If you like what you got there, use qwen2.5-coder

You will not believe the results, i did it and I did not believe the results. I have a preference for the 7B but the 14B also does really well, smaller than 7B needs a bit more encouragement or a more focused reasoning data set than what I used.

s-emanuilov

1 day ago

Thank you.

I’m also a big fan of Qwen models. However, in this case, I don’t think they are appropriate because I’m not entirely confident in their capabilities regarding multilingual contexts. That’s why I chose Llama.

Overall, I agree that the Qwen series is excellent for most tasks.

In this post