DeepSeek-OPT-Merged-1.3B
A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique.
🔍 Model Description
This model is created by merging two foundation models:
- Primary: DeepSeek Coder 1.3B (code generation capabilities)
- Secondary: OPT-350M (general language understanding)
🛠️ Training/Merging Process
Base Models Selection:
- DeepSeek Coder 1.3B for code understanding
- OPT-350M for general language capabilities
Merge Technique:
- Method: Linear interpolation
- Weight ratio: α=0.5 (50% each model)
- No additional training, pure weight merging
Technical Process:
- Used PyTorch for model handling
- Applied float16 precision
- Implemented memory efficient merging
- Used device map auto-detection
🧩 Configuration
models: model: deepseek-ai/deepseek-coder-1.3b-base # Base model model: facebook/opt-350m # Target model merge_method: linear parameters: alpha: 0.5 dtype: float16
💻 Usage
python from transformers import AutoModelForCausalLM, AutoTokenizer import torch
Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "grozmart1/deepseek-opt-merged-1.3b", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b")
Example usage text = "Write a Python function to sort a list:" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( inputs, max_length=200, temperature=0.7, top_p=0.95, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
🔧 Technical Details
- Architecture: Transformer-based language model
- Parameters: ~1.3B
- Precision: float16
- Merge Method: Linear interpolation (α=0.5)
- Device Support: CPU/GPU (Auto device mapping)
- Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM
📊 Model Evaluation
- Dataset: HumanEval (Code Generation Benchmark)
- Metric: pass@1 (Functional Correctness)
- Status: Pending evaluation
- Expected Capabilities:
- Code completion
- Function generation
- Technical documentation
- General text generation
📝 License
Apache 2.0
🚀 Intended Use
- Code generation and completion
- Technical documentation
- Programming assistance
- General text generation tasks
⚠️ Limitations
- Inherits limitations from both parent models
- May show inconsistencies in code generation
- Limited by context window of base models
- Performance varies by task type
- Downloads last month
- 36