Submitted by HugoLaurencon 125 Building and better understanding vision-language models: insights and future directions · 4 authors 5
Submitted by yifanzhang114 26 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? · 13 authors 4
Submitted by akhaliq 25 LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation · 8 authors 2
Submitted by JamesSand 24 Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time · 5 authors 4
Submitted by akhaliq 12 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities · 8 authors 2
Submitted by hasanar1f 11 HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments · 6 authors 2
Submitted by IAMJB 10 A Web-Based Solution for Federated Learning with LLM-Based Automation · 3 authors 1
Submitted by akhaliq 6 FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering · 4 authors 2
Submitted by amanchadha 5 RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering · 4 authors 1