Aritra Roy Gosthipaty's picture

Aritra Roy Gosthipaty PRO

ariG23498

·

https://arig23498.github.io/

AI & ML interests

Deep Representation Learning

Recent Activity

upvoted an article about 4 hours ago

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

upvoted a collection about 6 hours ago

upvoted an article about 9 hours ago

The Open Arabic LLM Leaderboard 2

View all activity

Organizations

ariG23498's activity

upvoted an article about 4 hours ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

By

and 1 other •

about 10 hours ago

• 15

upvoted a collection about 6 hours ago

SigLIP

Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated Dec 13, 2024 • 51

upvoted an article about 9 hours ago

Article

The Open Arabic LLM Leaderboard 2

2 days ago

• 17

upvoted an article 2 days ago

Article

Open-source DeepResearch – Freeing our search agents

8 days ago

• 911

upvoted an article 5 days ago

Article

Convert Transformers to ONNX with Hugging Face Optimum

Jun 22, 2022

• 4

upvoted 2 articles 13 days ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 335

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

By

•

13 days ago

• 24

upvoted a collection 13 days ago

DeepSeek R1 (All Versions)

DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 4 days ago • 167

upvoted an article 14 days ago

Article

Welcome to Inference Providers on the Hub 🔥

15 days ago

• 321

upvoted a collection 15 days ago

Qwen2.5

The Qwen 2.5 models are a series of AI models trained on 18 trillion tokens, supporting 29 languages and offering advanced features such as instructio • 33 items • Updated Oct 12, 2024 • 7

upvoted an article 15 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

15 days ago

• 706

upvoted a collection 15 days ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 3 items • Updated 16 days ago • 337

upvoted an article 19 days ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

20 days ago

• 124

upvoted a paper 19 days ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 73

upvoted an article 19 days ago

Article

Mastering Long Contexts in LLMs with KVPress

By

and 1 other •

19 days ago

• 62

upvoted a collection 20 days ago

InternVL2.5-MPO

Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization • 16 items • Updated 13 days ago • 26

upvoted 2 articles 21 days ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

May 16, 2024

• 41

Article

Yay! Organizations can now publish blog Articles

By

and 3 others •

22 days ago

• 33

upvoted a paper 22 days ago

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 50

upvoted a collection 22 days ago

DeepSeek-V3

3 items • Updated Jan 6 • 179