OliP
's Collections
Vision-Language
updated
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper
•
2407.14177
•
Published
•
43
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Paper
•
2407.04172
•
Published
•
23
facebook/chameleon-7b
Image-Text-to-Text
•
Updated
•
22k
•
173
vidore/colpali
Updated
•
47k
•
418
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper
•
2407.12580
•
Published
•
40
Wolf: Captioning Everything with a World Summarization Framework
Paper
•
2407.18908
•
Published
•
32
MMIU: Multimodal Multi-image Understanding for Evaluating Large
Vision-Language Models
Paper
•
2408.02718
•
Published
•
61
LLaVA-OneVision: Easy Visual Task Transfer
Paper
•
2408.03326
•
Published
•
60
Document Retrieval
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Paper
•
2408.05211
•
Published
•
47
nvidia/NVLM-D-72B
Image-Text-to-Text
•
Updated
•
47.8k
•
767
mistralai/Pixtral-12B-2409
Image-Text-to-Text
•
Updated
•
599
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper
•
2409.01704
•
Published
•
83
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text
•
Updated
•
373k
•
1.37k
deepseek-ai/Janus-1.3B
Any-to-Any
•
Updated
•
176k
•
569
h2oai/h2ovl-mississippi-2b
Text Generation
•
Updated
•
37.3k
•
27
HuggingFaceM4/Idefics3-8B-Llama3
Image-Text-to-Text
•
Updated
•
49.4k
•
264
wyu1/Leopard-Idefics2
Updated
•
11
•
4
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text
•
Updated
•
111k
•
375
alibaba-damo/mgp-str-base
Image-to-Text
•
Updated
•
3.78k
•
63
google/paligemma2-3b-pt-224
Image-Text-to-Text
•
Updated
•
50.2k
•
137