RombUltima-32B

FINGU-AI/Ultimos-32B is a merged model combining rombodawg/Rombos-LLM-V2.5-Qwen-32b and Sakalti/ultiima-32B. This model maintains the individual strengths of both Qwen and Ultima architectures while benefiting from an optimized fusion for improved reasoning, multilingual comprehension, and multi-turn conversation capabilities.

Training & Fine-Tuning

RombUltima-32B is based on a slerp merge of its parent models using equal weighting (0.5 each), resulting in a balanced fusion that leverages both structured knowledge from Rombos and enhanced generalization from Ultima.

Tokenization Approach: Uses a union-based tokenizer to maximize vocabulary coverage.
Precision: Trained and fine-tuned in bfloat16 for efficient inference.
Long-Context Support: Supports up to 32K tokens (based on Qwen-32B), with stable generation up to 8K tokens, depending on hardware constraints.
Multilingual Strength: Strong performance in English, French, Chinese, and other global languages.

Performance & Benchmarks

OpenLLM Leaderboard

📌 Coming Soon – Evaluation against leading LLM benchmarks.

MT-Bench

📌 Coming Soon – Multi-turn conversational performance analysis.

Usage

You can run this model using the following code:

import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained("FINGU-AI/Ultimos-32B")
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model="FINGU-AI/Ultimos-32B",
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Merging Details

Parent Models:
- 🟢 rombodawg/Rombos-LLM-V2.5-Qwen-32b (weight: 0.5)
- 🟢 Sakalti/ultiima-32B (weight: 0.5)
Merge Method: Linear
Tokenizer Source: Union-based
Precision: Float16

Licensing & Intended Use

License: Subject to original licenses of the merged models.
Intended Use: Research, content generation, multilingual applications, and general-purpose AI assistance.
Limitations: While the model excels in structured reasoning and multilingual understanding, hallucinations and biases may still exist.

📌 For feedback and contributions, visit: FINGU-AI on Hugging Face.