Simeon Emanuilov PRO

s-emanuilov

AI & ML interests

Software Engineer & Ph.D. candidate | Specializing in ML/DL system development & applying AI to solve real-world business problems.

Recent Activity

upvoted a paper 1 day ago

Fast Video Generation with Sliding Tile Attention

replied to their post 2 days ago

Tutorial 💥 Training a non-English reasoning model with GRPO and Unsloth I wanted to share my experiment with training reasoning models in languages other than English/Chinese. Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage. Full code and tutorial here: https://unfoldai.com/reasoning-in-a-non-english-language/ The model itself: https://huggingface.co/s-emanuilov/LLMBG-Llama-3.1-8B-BG-Reasoning-v0.1 I hope this helps anyone looking to build reasoning models in their language.

posted an update 2 days ago

View all activity

Organizations

Posts 4

Post

4429

Tutorial 💥 Training a non-English reasoning model with GRPO and Unsloth

I wanted to share my experiment with training reasoning models in languages other than English/Chinese.

Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage.

Full code and tutorial here: https://unfoldai.com/reasoning-in-a-non-english-language/

The model itself: s-emanuilov/LLMBG-Llama-3.1-8B-BG-Reasoning-v0.1

I hope this helps anyone looking to build reasoning models in their language.

Post

494

A new benchmark (DPAB-α) has been released that evaluates LLM function calling in both Pythonic and JSON approaches.

It shows that Pythonic function calling often outperforms traditional JSON-based methods, especially for complex multi-step tasks.

Key findings from benchmarks:
— Claude 3.5 Sonnet leads with 87% on Pythonic vs 45% on JSON
— Smaller models show impressive results (Dria-Agent-α-3B: 72% Pythonic)
— Even larger models like DeepSeek V3 (685B) show significant gaps (63% Pythonic vs 33% JSON)

If you're building or using LLM agents, these results suggest that how you implement function calling could impact performance - might be worth reconsidering JSON-only approaches.

The benchmark: https://github.com/firstbatchxyz/function-calling-eval
Blog post: https://huggingface.co/blog/andthattoo/dpab-a

View all Posts