File size: 3,482 Bytes
31833b2
32e62d6
31833b2
 
 
 
 
 
 
 
 
 
 
32e62d6
31833b2
32e62d6
31833b2
32e62d6
31833b2
32e62d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
license: apache-2.0
language:
- en
---

![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png)

# Athena-1 1.5B:

Athena-1 1.5B is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). Designed for efficiency and high-quality text generation, Athena-1 1.5B maintains a compact size, making it ideal for real-world applications where performance and resource efficiency are critical, such as lightweight applications, conversational AI, and structured data tasks.

---

## Key Features

### ⚡ Lightweight and Efficient

*   **Compact Size:** At just **1.5 billion parameters**, Athena-1 1.5B offers excellent performance with reduced computational requirements.
*   **Instruction Following:** Fine-tuned for precise and reliable adherence to user prompts.
*   **Coding and Mathematics:** Proficient in solving coding challenges and handling mathematical tasks.

### 📖 Long-Context Understanding

*   **Context Length:** Supports up to **32,768 tokens**, enabling the processing of moderately lengthy documents or conversations.
*   **Token Generation:** Can generate up to **8K tokens** of output.

### 🌍 Multilingual Support

*   Supports **29+ languages**, including:
    *   English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
    *   Japanese, Korean, Vietnamese, Thai, Arabic, and more.

### 📊 Structured Data & Outputs

*   **Structured Data Interpretation:** Processes structured formats like tables and JSON.
*   **Structured Output Generation:** Generates well-formatted outputs, including JSON and other structured formats.

---

## Model Details

*   **Base Model:** [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
*   **Architecture:** Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
*   **Parameters:** 1.5B total (Adjust non-embedding count if you have it).
*   **Layers:** (Adjust if different from the 3B model)
*   **Attention Heads:** (Adjust if different from the 3B model)
*   **Context Length:** Up to **32,768 tokens**.

---


## Applications

Athena 1.5B is designed for a variety of real-world applications:

*   **Conversational AI:** Build fast, responsive, and lightweight chatbots.
*   **Code Generation:** Generate, debug, or explain code snippets.
*   **Mathematical Problem Solving:** Assist with calculations and reasoning.
*   **Document Processing:** Summarize and analyze moderately large documents.
*   **Multilingual Applications:** Support for global use cases with diverse language requirements.
*   **Structured Data:** Process and generate structured data, such as tables and JSON.

---

## Quickstart

Here’s how you can use Athena 1.5B for quick text generation:

```python
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="Spestly/Athena-1-1.5B") # Update model name
print(pipe(messages))

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-1.5B") # Update model name
model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-1.5B") # Update model name
```