RombUltima-32B

FINGU-AI/Ultimos-32B is a merged model combining rombodawg/Rombos-LLM-V2.5-Qwen-32b and Sakalti/ultiima-32B. This model maintains the individual strengths of both Qwen and Ultima architectures while benefiting from an optimized fusion for improved reasoning, multilingual comprehension, and multi-turn conversation capabilities.


Training & Fine-Tuning

RombUltima-32B is based on a slerp merge of its parent models using equal weighting (0.5 each), resulting in a balanced fusion that leverages both structured knowledge from Rombos and enhanced generalization from Ultima.

  • Tokenization Approach: Uses a union-based tokenizer to maximize vocabulary coverage.
  • Precision: Trained and fine-tuned in bfloat16 for efficient inference.
  • Long-Context Support: Supports up to 32K tokens (based on Qwen-32B), with stable generation up to 8K tokens, depending on hardware constraints.
  • Multilingual Strength: Strong performance in English, French, Chinese, and other global languages.

Performance & Benchmarks

OpenLLM Leaderboard

πŸ“Œ Coming Soon – Evaluation against leading LLM benchmarks.

MT-Bench

πŸ“Œ Coming Soon – Multi-turn conversational performance analysis.


Usage

You can run this model using the following code:

import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained("FINGU-AI/Ultimos-32B")
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model="FINGU-AI/Ultimos-32B",
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Merging Details

  • Parent Models:
    • 🟒 rombodawg/Rombos-LLM-V2.5-Qwen-32b (weight: 0.5)
    • 🟒 Sakalti/ultiima-32B (weight: 0.5)
  • Merge Method: Linear
  • Tokenizer Source: Union-based
  • Precision: Float16

Licensing & Intended Use

  • License: Subject to original licenses of the merged models.
  • Intended Use: Research, content generation, multilingual applications, and general-purpose AI assistance.
  • Limitations: While the model excels in structured reasoning and multilingual understanding, hallucinations and biases may still exist.

πŸ“Œ For feedback and contributions, visit: FINGU-AI on Hugging Face.

Downloads last month
5
Safetensors
Model size
9.6B params
Tensor type
FP16
Β·
F32
Β·
U8
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.