Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset

Community Article Published February 10, 2025

DeepSeek-R1's entry into the open-source AI landscape has been groundbreaking. With reasoning capabilities rivaling OpenAI-01, it has captivated researchers and AI enthusiasts, making global headlines and disrupting the market. As a reasoning model, it simulates step-by-step logical and deductive thinking to solve complex problems. This makes it well-suited for mathematics, coding, logic, law, and medicine tasks, though its higher inference requirements make it less efficient for simpler tasks.

This blog post will walk you through fine-tuning DeepSeek using synthetic data. This approach will enable you to customize the Large Language Model (LLM) for a specific task, even in data-scarce domains, improving accuracy and reliability. As an example, we will demonstrate how to fine-tune Deepseek-R1 to solve problems using Python.

We will cover the following steps:

Generating the reasoning dataset with the Synthetic Data Generator.
Training unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit with unsloth.
Running inference and evaluating the results.

Let’s dive in!

Generate the Reasoning Dataset

To generate our high-quality reasoning dataset, we will use the Synthetic Data Generator, a user-friendly application that uses no code to create custom datasets with LLMs. In this guide, we will focus on generating synthetic reasoning data to solve coding problems in Python. However, this approach can be adapted for a wide range of domains.

For more information about the details and usage of the Synthetic Data Generator, check Introducing the Synthetic Data Generator - Build Datasets with Natural Language and the original GitHub Repository.

By default, the Synthetic Data Generator utilizes meta-llama/Llama-3.1-8B-Instruct via the Serverless Inference API, which is free but has rate limitations. With the latest release, you can configure a different model for completions. Since we need a reasoning dataset, we will use deepseek-ai/DeepSeek-R1-Distill-Qwen-32B, accessible through the Serverless Inference API. To add this model, you must only duplicate the Space and set the MODEL_COMPLETION environment variable.

Remember that you can also run inference locally using other providers like Ollama or LlamaCpp and via the Inference Providers.

Once the application is running, generating our synthetic dataset is straightforward. It takes three steps: selecting the input data, configuring the generator, and generating the full dataset.

Select the Input Data

Since we are generating data for Supervised Fine-Tuning (SFT), we will navigate to the Chat Data tab. Here, there are three options: generating from a system prompt, an existing dataset, or your documents. For our use case, we will focus on the first option. If you want to check how the others work, you can do so here.

We will need to describe the topic and scope of our dataset. For instance, we defined it as: "an assistant that solves python coding problems". Next, we will click on ‘Load’, and the application will begin generating an improved system prompt and some examples.

Configuring the Generator

As mentioned in the previous section, based on our dataset description, the Synthetic Data Generator will automatically generate a system prompt. This system prompt will act as a guide for the model to generate better-aligned data with the task. So, you can further refine it and iterate over it to improve the results. We defined it as:

You are an expert programmer with advanced knowledge of Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question. User questions are clear and concise.

Additionally, the number of turns can also be configured. In single-turn, two columns will be generated: prompt and completion. For multi-turn conversations, the messages column will contain the conversations, including the system, user, and assistant roles. For our case, we will leave the number of turns as 1, ensuring that each example consists of an input coding problem (prompt) and a corresponding reasoned Python solution (completion).

Generating the Dataset

Once we have completed the previous steps, we are ready to generate the full dataset! The generated dataset will be automatically available in the Hub and Argilla, ready for review and use.

In our case, we generated 500 rows with a temperature of 0.9 for instruction generation and 0.6 for completion, as recommended by DeepSeek. The process took around 2 hours; however, time may vary depending on the inference models used. In this case, DeepSeek requires more computational resources, leading to a longer processing time.

Amazing! We now have a synthetic reasoning dataset ready for fine-tuning Deepseek!

The following sections present simplified code snippets for clarity and easy understanding. You can access the full notebook if you’d like to explore the complete implementation.

Fine-Tune DeepSeek

To fine-tune our model, we will use Unsloth, a library that optimizes fine-tuning. Thus, you can perform the fine-tuning even on consumer hardware.

Load the Model

The first step will be to load the model, tokenizer, and adapters. Specifically, we will fine-tune unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit, which is a 4-bit quatized version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model is smaller than the one used for generation, so we expect slightly lower output quality. However, fine-tuning allows us to work with a smaller model while maintaining accuracy and reliability.

# Load the 4bit pre quantized model of deepseek and the tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# We add the LORA adapters to the model
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Prepare the Training Dataset

Next, we will prepare the training dataset by defining a prompt template and including the prompts and completions from our generated dataset.

# Prepare the dataset

prompt_style = """Below is an instruction that describes a task, paired with a question that provides further context.
Write a response that appropriately answers the question.
Before answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are an expert programmer with advanced knowledge of Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question.

### Question:
{}

### Response:
{}
"""

EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    completions = examples["completion"]
    texts = []
    for prompt,completion in zip(prompts, completions):
        text = prompt_style.format(prompt, completion) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

dataset = load_dataset("sdiazlor/python-reasoning-dataset", split="train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

Train the Model

Let’s start our model! Unsloth is fully compatible with the trl library. So, we can define the SFTTrainer and start fine-tuning. Remember to configure your training arguments according to your resource requirements to improve performance and accuracy. You can save the fine-tuned model locally or push it to the Hub in float16 (sdiazlor/deepseek-r1-distill-qwen-1.5-unsloth-sft-python) or GGUF (sdiazlor/deepseek-r1-distill-qwen-1.5-unsloth-sft-python_q4_k_m) format.

# Configure the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    dataset_num_proc=2,
    packing=False,
    args=training_arguments,
)

# Train the model
trainer_stats = trainer.train()

# Save the fine-tuned model
model.save_pretrained_merged(MODEL_NAME, tokenizer, save_method="merged_16bit")
model.push_to_hub_merged(fine_tuned_model, tokenizer, save_method="merged_16bit")
model.push_to_hub_gguf(
    f"{fine_tuned_model}_q4_k_m", tokenizer, quantization_method="q4_k_m"
)

Run Inference and Evaluate the Model

It's time to evaluate whether our fine-tuned model has improved! To that end, we will run inference and compare the model's results before and after fine-tuning.

# Run inference
question = "How can I get the prime numbers from 0 to 125?"

FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
response

The comparison below demonstrates that after fine-tuning, the model produces more comprehensive responses, including code examples. In contrast, the pre-trained model provides only a high-level outline without additional details.

# Response before fine-tuning
<think>
To find all prime numbers between 0 and 125, I can follow these steps:

1. **Define the range**: Identify the start and end of the range, which in this case are 0 and 125.

2. **Create a boolean array**: Initialize an array of booleans with the same length as the range. Each index in this array will represent whether the corresponding number is prime (True) or not (False).

3. **Mark non-prime numbers**: Starting from the first prime number (2), iterate through each number. For each prime, mark all its multiples as non-prime in the array.

4. **Identify primes**: After marking the non-primes, the remaining True values in the array correspond to prime numbers within the specified range.

5. **Output the results**: Extract the indices from the array where the value is True and list them to obtain all prime numbers from 0 to 125.

This method ensures that we efficiently identify primes using the Sieve of Eratosthenes algorithm.
</think>

To find all prime numbers between 0 and 125, follow these steps:

[...]

5. **Output the Results**: Extract these indices and list them to obtain all prime numbers from 0 to 125.

This method efficiently finds primes using the Sieve of Eratosthenes algorithm.<｜end▁of▁sentence｜>

# Response after fine-tuning
Okay, so I need to find all the prime numbers between 0 and 125. Hmm, primes are numbers greater than 1 that have no divisors other than 1 and themselves. So, first, I should probably start by listing all numbers from 0 to 125 and then eliminate the non-primes.

Wait, but 0 and 1 aren't primes. So I can ignore them. The smallest prime is 2. So maybe I should start checking from 2 onwards.

I remember that one method to check for primes is the Sieve of Eratosthenes.

[...]

So, the steps in code:

1. Define a function get_primes(n) that returns primes up to n.
2. Inside the function, create a list of booleans from 0 to n, initializing to True, then setting 0 and 1 to False.
3. For each i from 2 to sqrt(n) + 1, check if it's still True. If so, mark all multiples of i as False.
4. Collect all indices that are still True and return them as a list.

[...]

</think>

To find all prime numbers from 0 to 125, we can use the Sieve of Eratosthenes algorithm, which efficiently identifies primes by marking non-prime numbers. Here's a concise Python solution:

```python
import math

def get_primes_up_to_n(n):
    if n < 2:
        return []
    sieve = [True] * (n + 1)
    sieve[0] = sieve[1] = False
    for i in range(2, int(math.sqrt(n)) + 1):
        if sieve[i]:
            for j in range(i * i, n + 1, i):
                sieve[j] = False
    primes = [i for i, is_prime in enumerate(sieve) if is_prime]
    return primes

primes = get_primes_up_to_n(125)
print(primes)
```

This code initializes a boolean list for numbers up to 125, marks non-primes, and returns the primes. The result is a list of primes from 2 to 125.
<｜end▁of▁sentence｜>

Conclusion

In this blog post, we used the Synthetic Data Generator to create a custom and high-quality synthetic reasoning dataset for solving Python coding problems with DeepSeek-R1-Distill-Qwen-32B. We then fine-tuned a smaller model, DeepSeek-R1-Distill-Qwen-1.5B, using this dataset. After fine-tuning, we could observe that the model produced more detailed and explanatory responses. So, this workflow serves as a foundational example that can be scaled for real-world applications.

Next Steps

Explore more examples:
Contribute to the repository with new features and ideas!

What are you waiting for? Start synthesizing!

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote