|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- unsloth/Llama-3.2-3B-bnb-4bit |
|
pipeline_tag: text-generation |
|
tags: |
|
- art |
|
- music |
|
--- |
|
[](https://flic.kr/p/9SWAXj) [Odd Eyed Black Cat](https://flic.kr/p/9SWAXj) by [fourbyfourblazer](https://www.flickr.com/photos/chrisyarzab/), on Flickr |
|
|
|
|
|
## Table of Contents |
|
- [Model Description](#model-description) |
|
- [Model Architecture](#model-architecture) |
|
- [Training Data](#training-data) |
|
- [Training Procedure](#training-procedure) |
|
- [Usage](#usage) |
|
- [Limitations](#limitations) |
|
- [Ethical Considerations](#ethical-considerations) |
|
- [Acknowledgements](#acknowledgements) |
|
- [Citations](#citations) |
|
- [License](#license) |
|
|
|
## Model Description |
|
|
|
**cat0.1** is a conversational AI model with **3 billion parameters**, optimized for efficiency using **4-bit precision**. Designed to engage in dynamic and uncensored dialogues, cat0.1 has been trained over the past eight months through an iterative process of training and interactive chatting. The model embodies a diverse range of characters, enabling versatile and engaging interactions. **cat0.1** is adapted from [unsloth/Llama-3.2-3B-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-3B-bnb-4bit), leveraging its robust architecture to enhance conversational capabilities. |
|
|
|
## Model Architecture |
|
|
|
- **Parameters:** 3 billion |
|
- **Precision:** 4-bit |
|
- **Training Configuration:** |
|
- **Rank:** 32 |
|
- **Alpha:** 64 |
|
- **Hardware:** Trained on an RTX 4090 laptop GPU |
|
|
|
## Training Data |
|
|
|
The model was trained on a diverse set of conversational data collected over eight months. The data includes interactions with various characters, ensuring a wide range of conversational styles and topics. Training data is continuously updated with new chunks, allowing the model to evolve and adapt over time. |
|
|
|
## Training Procedure |
|
|
|
cat0.1 employs a **progressive training** approach: |
|
1. **Initial Training:** The model is initially trained on a base set of conversational data. |
|
2. **Interactive Training:** The trained model is engaged in chats, generating new data based on its interactions. |
|
3. **Data Update Cycle:** |
|
- **Data Collection:** New conversational data chunks are gathered from interactions. |
|
- **Training Update:** The model is retrained with the new data. Occasionally, older data is removed to focus on recent interactions, while retaining previous model parameters. |
|
4. **Iteration:** This cycle of training and data updating is repeated frequently to ensure the model remains current and responsive. |
|
|
|
## Usage |
|
|
|
cat0.1 is designed for applications requiring dynamic and unrestricted conversational capabilities. Suitable use cases include: |
|
|
|
- **Chatbots:** For platforms needing engaging and versatile conversational agents. |
|
- **Creative Writing Assistance:** Helping writers generate dialogue and character interactions. |
|
- **Entertainment:** Providing interactive experiences in games and virtual environments. |
|
|
|
### Example |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("rwitz/cat0.1") |
|
model = AutoModelForCausalLM.from_pretrained("rwitz/cat0.1", torch_dtype=torch.float16) |
|
|
|
# Encode input |
|
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt") |
|
|
|
# Generate response |
|
with torch.no_grad(): |
|
output = model.generate(input_ids, max_length=50) |
|
|
|
# Decode and print |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |