File size: 3,535 Bytes
ada7240 aaf6957 ada7240 41ef736 ada7240 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
license: mit
language:
- en
base_model:
- unsloth/Llama-3.2-3B-bnb-4bit
pipeline_tag: text-generation
tags:
- art
- music
---
[](https://flic.kr/p/9SWAXj) [Odd Eyed Black Cat](https://flic.kr/p/9SWAXj) by [fourbyfourblazer](https://www.flickr.com/photos/chrisyarzab/), on Flickr
## Table of Contents
- [Model Description](#model-description)
- [Model Architecture](#model-architecture)
- [Training Data](#training-data)
- [Training Procedure](#training-procedure)
- [Usage](#usage)
- [Limitations](#limitations)
- [Ethical Considerations](#ethical-considerations)
- [Acknowledgements](#acknowledgements)
- [Citations](#citations)
- [License](#license)
## Model Description
**cat0.1** is a conversational AI model with **3 billion parameters**, optimized for efficiency using **4-bit precision**. Designed to engage in dynamic and uncensored dialogues, cat0.1 has been trained over the past eight months through an iterative process of training and interactive chatting. The model embodies a diverse range of characters, enabling versatile and engaging interactions. **cat0.1** is adapted from [unsloth/Llama-3.2-3B-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-3B-bnb-4bit), leveraging its robust architecture to enhance conversational capabilities.
## Model Architecture
- **Parameters:** 3 billion
- **Precision:** 4-bit
- **Training Configuration:**
- **Rank:** 32
- **Alpha:** 64
- **Hardware:** Trained on an RTX 4090 laptop GPU
## Training Data
The model was trained on a diverse set of conversational data collected over eight months. The data includes interactions with various characters, ensuring a wide range of conversational styles and topics. Training data is continuously updated with new chunks, allowing the model to evolve and adapt over time.
## Training Procedure
cat0.1 employs a **progressive training** approach:
1. **Initial Training:** The model is initially trained on a base set of conversational data.
2. **Interactive Training:** The trained model is engaged in chats, generating new data based on its interactions.
3. **Data Update Cycle:**
- **Data Collection:** New conversational data chunks are gathered from interactions.
- **Training Update:** The model is retrained with the new data. Occasionally, older data is removed to focus on recent interactions, while retaining previous model parameters.
4. **Iteration:** This cycle of training and data updating is repeated frequently to ensure the model remains current and responsive.
## Usage
cat0.1 is designed for applications requiring dynamic and unrestricted conversational capabilities. Suitable use cases include:
- **Chatbots:** For platforms needing engaging and versatile conversational agents.
- **Creative Writing Assistance:** Helping writers generate dialogue and character interactions.
- **Entertainment:** Providing interactive experiences in games and virtual environments.
### Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("rwitz/cat0.1")
model = AutoModelForCausalLM.from_pretrained("rwitz/cat0.1", torch_dtype=torch.float16)
# Encode input
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
# Generate response
with torch.no_grad():
output = model.generate(input_ids, max_length=50)
# Decode and print
print(tokenizer.decode(output[0], skip_special_tokens=True)) |