17 3 19

Mariusz Kurman PRO

mkurman

AI & ML interests

AI Tech Lead | MD

Recent Activity

posted an update 4 days ago

Blurred-Thoughts Supervised-Finetuning 🙈 After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses. BT-SFT introduces: ✅ Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data. ✅ Reward function that ensures responses are well-structured. Explore and contribute to the project available in my GitHub repository: https://github.com/mkurman/blurred-thoughts-SFT Keep me updated on your experiments with BT-SFT! 🐐

updated a model 4 days ago

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

updated a dataset 4 days ago

mkurman/simplescaling-s1K-R1

View all activity

Organizations

Posts 10

Post

1467

Blurred-Thoughts Supervised-Finetuning 🙈

After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses.

BT-SFT introduces:
✅ Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data.
✅ Reward function that ensures responses are well-structured.

Explore and contribute to the project available in my GitHub repository:
https://github.com/mkurman/blurred-thoughts-SFT

Keep me updated on your experiments with BT-SFT! 🐐

Post

2031

Blurred-Thoughts Supervised Fine-Tuning (BT-SFT) 🤖

Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.

We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?

We observed that various models differ in their thinking processes, and fine-tuning one model on another model’s thoughts (CoT) can sometimes be inefficient—often resulting in the model simply memorizing reasoning rather than learning how to actually think.

I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.

To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.

Enjoy! 🚀

PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.

View all Posts

spaces 1

Sleeping

Llama 3.2 SUN 2.5B Chat

💬

You can try MedIT Solutions latest release of SUN 2.5B Llama

models 4

Mariusz Kurman PRO

AI & ML interests

Recent Activity

Organizations

Posts 10

spaces 1

Llama 3.2 SUN 2.5B Chat

models 4

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

mkurman/Qwen2.5-14B-DeepSeek-R1-1M

mkurman/phi4-MedIT-10B-o1

mkurman/llama-3.2-MEDIT-3B-o1

datasets 1

mkurman/simplescaling-s1K-R1

Mariusz Kurman PRO

AI & ML interests

Recent Activity

Organizations

Posts 10

spaces 1

Llama 3.2 SUN 2.5B Chat

models 4 Sort: Recently updated

datasets 1

models 4