![](https://cdn-avatars.huggingface.co/v1/production/uploads/62cfd8e9c1d35087f5fde537/rA48WIRA_z1jrob4xosHf.png)
Kornia AI
non-profit
AI & ML interests
computer-vision, image-processing, machine-learning, deep-learning
Recent Activity
View all activity
kornia's activity
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65bb837dbfb878f46c77de4c/UVtVbF_3rdt0DC8xTkpL1.jpeg)
prithivMLmods
posted
an
update
3 days ago
Post
3619
QwQ Edge Gets a Small Update..! 💬
try now: prithivMLmods/QwQ-Edge
🚀Now, you can use the following commands for different tasks:
🖼️ @image 'prompt...' → Generates an image
🔉@tts1 'prompt...' → Generates speech in a female voice
🔉 @tts2 'prompt...' → Generates speech in a male voice
🅰️@text 'prompt...' → Enables textual conversation (If not specified, text-to-text generation is the default mode)
💬Multimodality Support : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
💬For text generation, the FastThink-0.5B model ensures quick and efficient responses, prithivMLmods/FastThink-0.5B-Tiny
💬Image Generation: sdxl lightning model, SG161222/RealVisXL_V4.0_Lightning
Github: https://github.com/PRITHIVSAKTHIUR/QwQ-Edge
try now: prithivMLmods/QwQ-Edge
🚀Now, you can use the following commands for different tasks:
🖼️ @image 'prompt...' → Generates an image
🔉@tts1 'prompt...' → Generates speech in a female voice
🔉 @tts2 'prompt...' → Generates speech in a male voice
🅰️@text 'prompt...' → Enables textual conversation (If not specified, text-to-text generation is the default mode)
💬Multimodality Support : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
💬For text generation, the FastThink-0.5B model ensures quick and efficient responses, prithivMLmods/FastThink-0.5B-Tiny
💬Image Generation: sdxl lightning model, SG161222/RealVisXL_V4.0_Lightning
Github: https://github.com/PRITHIVSAKTHIUR/QwQ-Edge
graph TD
A[User Interface] --> B[Chat Logic]
B --> C{Command Type}
C -->|Text| D[FastThink-0.5B]
C -->|Image| E[Qwen2-VL-OCR-2B]
C -->|@image| F[Stable Diffusion XL]
C -->|@tts| G[Edge TTS]
D --> H[Response]
E --> H
F --> H
G --> H
Post
2441
Interesting releases in open AI this week, let's recap 🤠
merve/feb-7-releases-67a5f7d7f172d8bfe0dd66f4
🤖 Robotics
> Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0)
💬 LLMs
> Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math 🤯 s1-32B and s1K is out!
> Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis
> Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages
👀 Multimodal
> PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything
> OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese
> Krutrim released Chitrarth, a new vision language model for Indic languages and English
🖼️ Vision
> BiRefNet_HR is a new higher resolution BiRefNet for background removal
🗣️ Audio
> kyutai released Hibiki, it's a real-time speech-to-speech translation model 🤯 it's available for French-English translation
> Krutrim released Dhwani, a new STT model for Indic languages
> They also release a new dataset for STT-TTS
🖼️ Image Generation
> Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation
> Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint
> boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan
🤖 Robotics
> Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0)
💬 LLMs
> Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math 🤯 s1-32B and s1K is out!
> Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis
> Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages
👀 Multimodal
> PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything
> OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese
> Krutrim released Chitrarth, a new vision language model for Indic languages and English
🖼️ Vision
> BiRefNet_HR is a new higher resolution BiRefNet for background removal
🗣️ Audio
> kyutai released Hibiki, it's a real-time speech-to-speech translation model 🤯 it's available for French-English translation
> Krutrim released Dhwani, a new STT model for Indic languages
> They also release a new dataset for STT-TTS
🖼️ Image Generation
> Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation
> Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint
> boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan
Update README.md
#2 opened 5 days ago
by
MateoSP
![](https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mkX765eOqCw4wFPtXKRDK.png)
Post
2129
IBM released
ibm-granite/granite-vision-3.1-2b-preview, a small vision LM with impressive performance on different tasks 😮🔥
it comes with transformers and vLLM support from the get-go 💗
you can run it in Colab T4, so I built a notebook to put it to test, find it here: https://github.com/merveenoyan/smol-vision/blob/main/inference_gists/IBM_Granite_Vision.ipynb
it comes with transformers and vLLM support from the get-go 💗
you can run it in Colab T4, so I built a notebook to put it to test, find it here: https://github.com/merveenoyan/smol-vision/blob/main/inference_gists/IBM_Granite_Vision.ipynb
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65bb837dbfb878f46c77de4c/UVtVbF_3rdt0DC8xTkpL1.jpeg)
prithivMLmods
posted
an
update
9 days ago
Post
4732
o3-Mini and Deepseek R1
Worked out with some famous and weird examples.
🔥Blog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1
Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.
example 1: o3 Mini , example 2: Deepseek R1
Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer
Worked out with some famous and weird examples.
🔥Blog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1
Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.
example 1: o3 Mini , example 2: Deepseek R1
Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer
Post
3780
This week in open AI was 🔥 Let's recap! 🤗
merve/january-31-releases-679a10669bd4030090c5de4d
LLMs 💬
> Huge: AllenAI released new Tülu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B 🔥
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license 🔥
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset
VLMs & vision 👀
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization 🔥
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!
Audio 🗣️
> YuE is a new open-source music generation foundation model, lyrics-to-song generation
Codebase 👩🏻💻
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1
LLMs 💬
> Huge: AllenAI released new Tülu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B 🔥
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license 🔥
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset
VLMs & vision 👀
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization 🔥
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!
Audio 🗣️
> YuE is a new open-source music generation foundation model, lyrics-to-song generation
Codebase 👩🏻💻
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/656e3808d4de03a07d116850/62cFw46AmuhdI3gS24F1M.jpeg)
ZennyKenny
posted
an
update
12 days ago
Post
421
GradientBoostingClassifier is an algorithm supported by the Python SciKit library, and now you can quickly train an ML model using this powerful technique on any (viable) dataset in the Hugging Face Hub without a line of code.
Love finishing a project right when the late night starts to turn into the early morning: sklearn-docs/GradientBoostingClassifier
Long time listener, first time caller, but always pleased to contribute, even if only adjacently, to the power of SciKit.
Love finishing a project right when the late night starts to turn into the early morning: sklearn-docs/GradientBoostingClassifier
Long time listener, first time caller, but always pleased to contribute, even if only adjacently, to the power of SciKit.
Post
3426
I have just released a new blogpost about kv caching and its role in inference speedup 🚀
🔗 https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :
🔗 https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65bb837dbfb878f46c77de4c/UVtVbF_3rdt0DC8xTkpL1.jpeg)
prithivMLmods
posted
an
update
13 days ago
Post
5092
Oof, what a week! 🥵 So many things have happened, let's recap!
merve/jan-24-releases-6793d610774073328eac67a9
Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Post
2244
smolagents can see 🔥
we just shipped vision support to smolagents 🤗 agentic computers FTW
you can now:
💻 let the agent get images dynamically (e.g. agentic web browser)
📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🤯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠
read our blog http://hf.co/blog/smolagents-can-see
we just shipped vision support to smolagents 🤗 agentic computers FTW
you can now:
💻 let the agent get images dynamically (e.g. agentic web browser)
📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🤯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠
read our blog http://hf.co/blog/smolagents-can-see
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65bb837dbfb878f46c77de4c/UVtVbF_3rdt0DC8xTkpL1.jpeg)
prithivMLmods
posted
an
update
22 days ago
Post
3712
Q'n' Sketches ❤️🔥
🖼️ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag
🐍 Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
🔗Page : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
🖼️ Adapters:
- Qs : strangerzonehf/Qs-Sketch
- Qd : strangerzonehf/Qd-Sketch
- Qx : strangerzonehf/Qx-Art
- Qc : strangerzonehf/Qc-Sketch
- Bb : strangerzonehf/Bg-Bag
🐍 Collection : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
🔗Page : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
Post
1601
R1 is out! And with a lot of other R1 releated models...
![](https://cdn-avatars.huggingface.co/v1/production/uploads/656e3808d4de03a07d116850/62cFw46AmuhdI3gS24F1M.jpeg)
ZennyKenny
posted
an
update
24 days ago
Post
439
Really pleased with the Bring Your Own Model (BYOM) feature in Brave Browser: https://brave.com/blog/byom-nightly/
Takes about 5 minutes to configure your own locally running LLM as an in-browser assistant. Totally local, totally private, totally yours.
Takes about 5 minutes to configure your own locally running LLM as an in-browser assistant. Totally local, totally private, totally yours.
Post
2572
Everything that happened this week in open AI, a recap 🤠
merve/jan-17-releases-678a673a9de4a4675f215bf5
👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance
💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI
- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3
🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture
🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities
📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
Post
1564
we now have more than 2000 public AI models using ModelHubMixin🤗