Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update 4 days ago
Interesting releases in open AI this week, let's recap šŸ¤  https://huggingface.co/collections/merve/feb-7-releases-67a5f7d7f172d8bfe0dd66f4 šŸ¤– Robotics > Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0) šŸ’¬ LLMs > Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math šŸ¤Æ s1-32B and s1K is out! > Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis > Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages šŸ‘€ Multimodal > PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything > OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese > Krutrim released Chitrarth, a new vision language model for Indic languages and English šŸ–¼ļø Vision > BiRefNet_HR is a new higher resolution BiRefNet for background removal šŸ—£ļø Audio > kyutai released Hibiki, it's a real-time speech-to-speech translation model šŸ¤Æ it's available for French-English translation > Krutrim released Dhwani, a new STT model for Indic languages > They also release a new dataset for STT-TTS šŸ–¼ļø Image Generation > Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation > Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint > boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan
updated a collection 4 days ago
Feb 7 Releases šŸ§£
View all activity

Organizations

Hugging Face's profile picture Google's profile picture SODA's profile picture Notebooks-explorers's profile picture Deprem Yapay Zeka's profile picture Deprem Private's profile picture PyTorch Image Models's profile picture Turkish NLP Dataset Creators's profile picture Templates's profile picture Demo Crafters šŸ¤— 's profile picture Keras's profile picture tensorflow's profile picture Mukayese's profile picture HugGAN Community's profile picture EPFL VILAB's profile picture Hugging Face Fellows's profile picture Huggingface.js's profile picture Tools's profile picture HuggingFaceM4's profile picture scikit-learn's profile picture JAX ā™„ļø Diffusers šŸ§Ø's profile picture 2023 Jan Offsite hackathon's profile picture HF Canonical Model Maintainers's profile picture scikit-learn's profile picture fastai X Hugging Face Group 2022's profile picture Huggingface Projects's profile picture boun-tabi-LMG's profile picture Kornia AI's profile picture skops-tests's profile picture Hugging Face H4's profile picture Keras Dreambooth Event's profile picture Turkish T5 - BERT - GPT-2's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture Hacktoberfest 2023's profile picture Hugging Face TB Research's profile picture adept-hf-collab's profile picture ZeroGPU Explorers's profile picture kotol's profile picture Magic Leap Community's profile picture Llava Hugging Face's profile picture MLX Community's profile picture Social Post Explorers's profile picture Top Contributors: Profile Followers's profile picture Dev Mode Explorers's profile picture Paris AI Running Club's profile picture yorg's profile picture CVPR2024's profile picture Les papiers de Merve's profile picture nltpt's profile picture s0409's profile picture Hugging Face FineVideo's profile picture mv's profile picture Cookbook Authors's profile picture open/ acc's profile picture Agents's profile picture University of Sydney's profile picture s0225's profile picture

Posts 96

view post
Post
2349
Interesting releases in open AI this week, let's recap šŸ¤  merve/feb-7-releases-67a5f7d7f172d8bfe0dd66f4

šŸ¤– Robotics
> Pi0, first open-source foundation vision-language action model was released in Le Robot (Apache 2.0)

šŸ’¬ LLMs
> Groundbreaking: s1 is simpler approach to test-time scaling, the release comes with small s1K dataset of 1k question-reasoning trace pairs (from Gemini-Thinking Exp) they fine-tune Qwen2.5-32B-Instruct to get s1-32B, outperforming o1-preview on math šŸ¤Æ s1-32B and s1K is out!
> Adyen released DABstep, a new benchmark along with it's leaderboard demo for agents doing data analysis
> Krutrim released Krutrim-2 instruct, new 12B model based on NeMo12B trained and aligned on Indic languages, a new multilingual sentence embedding model (based on STSB-XLM-R), and a translation model for Indic languages

šŸ‘€ Multimodal
> PKU released Align-DS-V, a model aligned using their new technique called LLF for all modalities (image-text-audio), along with the dataset Align Anything
> OLA-7B is a new any-to-any model by Tencent that can take text, image, video, audio data with context window of 32k tokens and output text and speech in English and Chinese
> Krutrim released Chitrarth, a new vision language model for Indic languages and English

šŸ–¼ļø Vision
> BiRefNet_HR is a new higher resolution BiRefNet for background removal

šŸ—£ļø Audio
> kyutai released Hibiki, it's a real-time speech-to-speech translation model šŸ¤Æ it's available for French-English translation
> Krutrim released Dhwani, a new STT model for Indic languages
> They also release a new dataset for STT-TTS

šŸ–¼ļø Image Generation
> Lumina released Lumina-Image-2.0, a 2B parameter-flow based DiT for text to image generation
> Tencent released Hunyuan3D-2, a 3D asset generation model based on DiT and Hunyuan3D-Paint
> boreal-hl-v1 is a new boring photorealistic image generation LoRA based on Hunyuan

Articles 20

Article
897

Open-source DeepResearch ā€“ Freeing our search agents