CCMat
's Collections
PaliGemma: A versatile 3B VLM for transfer
Paper
•
2407.07726
•
Published
•
68
Vision language models are blind
Paper
•
2407.06581
•
Published
•
83
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
•
2404.16994
•
Published
•
36
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
•
2403.05525
•
Published
•
43
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
•
2405.04434
•
Published
•
17
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Paper
•
2404.19752
•
Published
•
24
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
Handling Resolutions from 336 Pixels to 4K HD
Paper
•
2404.06512
•
Published
•
30
Sigmoid Loss for Language Image Pre-Training
Paper
•
2303.15343
•
Published
•
6
CogVLM: Visual Expert for Pretrained Language Models
Paper
•
2311.03079
•
Published
•
23
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
Comprehension in Vision-Language Large Model
Paper
•
2401.16420
•
Published
•
55
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
102
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper
•
2411.14402
•
Published
•
43