YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Test network using differential attention instead of classical attention (using nope). Other than some alterations to the attention, this is otherwise the same configuration as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
Scripts:
inference.py
to run the model with some test promptstest_train.py
runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with"text":"example text", "text":"..."
Notes:
Compared to the control model of Smollm2, this is bordering on incoherent. Potentially this model size is too small to correctly leverage differential attention. It's clearly picked up on some ideas in language, but is generally worse than the control model using GQA in terms of human output.
Training Metrics
Dataset Information
- Training data per epoch: 1 GB
- Total tokens trained: 48,261,120
- No sythetic data
Training Results
- Final Train Loss: 2.6883
- Final Train Perplexity: 14.71
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.