DeepSeek-OPT-Merged-1.3B

A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique.

🔍 Model Description

This model is created by merging two foundation models:

  • Primary: DeepSeek Coder 1.3B (code generation capabilities)
  • Secondary: OPT-350M (general language understanding)

🛠️ Training/Merging Process

  1. Base Models Selection:

    • DeepSeek Coder 1.3B for code understanding
    • OPT-350M for general language capabilities
  2. Merge Technique:

    • Method: Linear interpolation
    • Weight ratio: α=0.5 (50% each model)
    • No additional training, pure weight merging
  3. Technical Process:

    • Used PyTorch for model handling
    • Applied float16 precision
    • Implemented memory efficient merging
    • Used device map auto-detection

🧩 Configuration

models: model: deepseek-ai/deepseek-coder-1.3b-base # Base model model: facebook/opt-350m # Target model merge_method: linear parameters: alpha: 0.5 dtype: float16

💻 Usage

python from transformers import AutoModelForCausalLM, AutoTokenizer import torch

Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "grozmart1/deepseek-opt-merged-1.3b", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b")

Example usage text = "Write a Python function to sort a list:" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( inputs, max_length=200, temperature=0.7, top_p=0.95, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔧 Technical Details

  • Architecture: Transformer-based language model
  • Parameters: ~1.3B
  • Precision: float16
  • Merge Method: Linear interpolation (α=0.5)
  • Device Support: CPU/GPU (Auto device mapping)
  • Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM

📊 Model Evaluation

  • Dataset: HumanEval (Code Generation Benchmark)
  • Metric: pass@1 (Functional Correctness)
  • Status: Pending evaluation
  • Expected Capabilities:
    • Code completion
    • Function generation
    • Technical documentation
    • General text generation

📝 License

Apache 2.0

🚀 Intended Use

  • Code generation and completion
  • Technical documentation
  • Programming assistance
  • General text generation tasks

⚠️ Limitations

  • Inherits limitations from both parent models
  • May show inconsistencies in code generation
  • Limited by context window of base models
  • Performance varies by task type
Downloads last month
36
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for grozmart1/deepseek-opt-merged-1.3b

Finetuned
(101)
this model
Quantizations
1 model

Dataset used to train grozmart1/deepseek-opt-merged-1.3b