Duino
/

Darija-GPT

@@ -23,7 +23,7 @@ This is a small multilingual language model based on a Transformer architecture
 ### Architecture
-- Transformer-based language model (Decoder-only).
 - Reduced model dimensions (`n_embd=768`, `n_head=12`, `n_layer=12`) for faster training and smaller model size, making it suitable for resource-constrained environments.
 - Uses Byte-Pair Encoding (BPE) tokenizer trained on the same Wikipedia data.

 ### Architecture
+- Transformer-based language model (Decoder-only), now using Hugging Face Transformers' GPT2 architecture.
 - Reduced model dimensions (`n_embd=768`, `n_head=12`, `n_layer=12`) for faster training and smaller model size, making it suitable for resource-constrained environments.
 - Uses Byte-Pair Encoding (BPE) tokenizer trained on the same Wikipedia data.