Llava Hugging Face

AI & ML interests

None defined yet.

Recent Activity

RaushanTurganbay new activity 1 day ago

llava-hf/llava-onevision-qwen2-0.5b-ov-hf:Error when attempting to run either model... ValueError: embed_dim must be divisible by num_heads (got `embed_dim`: 1152 and `num_heads`: 14).

RaushanTurganbay new activity 1 day ago

llava-hf/llava-v1.6-mistral-7b-hf:Fix typo in chat template (assistant EOS token)

RaushanTurganbay new activity 5 days ago

llava-hf/llava-v1.6-mistral-7b-hf:<\s> token in the chat template instead of the </s> EOS token

View all activity

llava-hf's activity

RaushanTurganbay

in llava-hf/llava-onevision-qwen2-0.5b-ov-hf 1 day ago

Error when attempting to run either model... ValueError: embed_dim must be divisible by num_heads (got `embed_dim`: 1152 and `num_heads`: 14).

#4 opened 5 months ago by

jdc4429

RaushanTurganbay

in llava-hf/llava-v1.6-mistral-7b-hf 1 day ago

Fix typo in chat template (assistant EOS token)

#43 opened 4 days ago by

fabric

merve

posted an update 4 days ago

Post

2441

Interesting releases in open AI this week, let's recap 🤠 merve/feb-7-releases-67a5f7d7f172d8bfe0dd66f4

🤖 Robotics
> Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0)

💬 LLMs
> Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math 🤯 s1-32B and s1K is out!
> Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis
> Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages

👀 Multimodal
> PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything
> OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese
> Krutrim released Chitrarth, a new vision language model for Indic languages and English

🖼️ Vision
> BiRefNet_HR is a new higher resolution BiRefNet for background removal

🗣️ Audio
> kyutai released Hibiki, it's a real-time speech-to-speech translation model 🤯 it's available for French-English translation
> Krutrim released Dhwani, a new STT model for Indic languages
> They also release a new dataset for STT-TTS

🖼️ Image Generation
> Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation
> Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint
> boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan

Xenova

posted an update 4 days ago

Post

5054

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

7 replies

merve

posted an update 5 days ago

Post

2129

IBM released ibm-granite/granite-vision-3.1-2b-preview, a small vision LM with impressive performance on different tasks 😮🔥

it comes with transformers and vLLM support from the get-go 💗
you can run it in Colab T4, so I built a notebook to put it to test, find it here: https://github.com/merveenoyan/smol-vision/blob/main/inference_gists/IBM_Granite_Vision.ipynb

RaushanTurganbay

in llava-hf/llava-v1.6-mistral-7b-hf 5 days ago

<\s> token in the chat template instead of the </s> EOS token

#41 opened about 1 month ago by

fabric

nielsr

in llava-hf/llava-v1.6-mistral-7b-hf 5 days ago

<\s> token in the chat template instead of the </s> EOS token

#41 opened about 1 month ago by

fabric

Fix typo in README

#42 opened 5 days ago by

0x62656e

Xenova

authored a paper 5 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 7 days ago • 153

RaushanTurganbay

in llava-hf/llava-1.5-7b-hf 11 days ago

AttributeError: 'CLIPImageProcessor' object has no attribute 'patch_size' when loading fine-tuned LLaVA model from Google Drive

#47 opened 11 days ago by

mdnasif

merve

posted an update 11 days ago

Post

3780

This week in open AI was 🔥 Let's recap! 🤗 merve/january-31-releases-679a10669bd4030090c5de4d
LLMs 💬
> Huge: AllenAI released new Tülu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B 🔥
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license 🔥
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision 👀
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization 🔥
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio 🗣️
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase 👩🏻‍💻
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1

1 reply

RaushanTurganbay

in llava-hf/LLaVA-NeXT-Video-7B-hf 11 days ago

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-hf/resolve/main/chat_template.jinja

#14 opened 12 days ago by

BavariaForest

RaushanTurganbay

updated 8 models 15 days ago

AI & ML interests

Recent Activity

Team members 7

llava-hf's activity

Error when attempting to run either model... ValueError: embed_dim must be divisible by num_heads (got `embed_dim`: 1152 and `num_heads`: 14).

Fix typo in chat template (assistant EOS token)

<\s> token in the chat template instead of the </s> EOS token

<\s> token in the chat template instead of the </s> EOS token

Fix typo in README

AttributeError: 'CLIPImageProcessor' object has no attribute 'patch_size' when loading fine-tuned LLaVA model from Google Drive

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-hf/resolve/main/chat_template.jinja