DeepSeek-OPT-Merged-1.3B

A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique.

🔍 Model Description

This model is created by merging two foundation models:

Primary: DeepSeek Coder 1.3B (code generation capabilities)
Secondary: OPT-350M (general language understanding)

🛠️ Training/Merging Process

Base Models Selection:
- DeepSeek Coder 1.3B for code understanding
- OPT-350M for general language capabilities
Merge Technique:
- Method: Linear interpolation
- Weight ratio: α=0.5 (50% each model)
- No additional training, pure weight merging
Technical Process:
- Used PyTorch for model handling
- Applied float16 precision
- Implemented memory efficient merging
- Used device map auto-detection

🧩 Configuration

models: model: deepseek-ai/deepseek-coder-1.3b-base # Base model model: facebook/opt-350m # Target model merge_method: linear parameters: alpha: 0.5 dtype: float16

💻 Usage

python from transformers import AutoModelForCausalLM, AutoTokenizer import torch

Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "grozmart1/deepseek-opt-merged-1.3b", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b")

Example usage text = "Write a Python function to sort a list:" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( inputs, max_length=200, temperature=0.7, top_p=0.95, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔧 Technical Details

Architecture: Transformer-based language model
Parameters: ~1.3B
Precision: float16
Merge Method: Linear interpolation (α=0.5)
Device Support: CPU/GPU (Auto device mapping)
Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM

📊 Model Evaluation

Dataset: HumanEval (Code Generation Benchmark)
Metric: pass@1 (Functional Correctness)
Status: Pending evaluation
Expected Capabilities:
- Code completion
- Function generation
- Technical documentation
- General text generation

📝 License

Apache 2.0

🚀 Intended Use

Code generation and completion
Technical documentation
Programming assistance
General text generation tasks

⚠️ Limitations

Inherits limitations from both parent models
May show inconsistencies in code generation
Limited by context window of base models
Performance varies by task type

grozmart1
/

deepseek-opt-merged-1.3b