LLaVA-Critic Collection as a general evaluator for assessing model performance • 6 items • Updated 1 day ago • 5
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published 2 days ago • 46
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published 4 days ago • 34
The Perfect Blend: Redefining RLHF with Mixture of Judges Paper • 2409.20370 • Published 5 days ago • 4
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published 3 days ago • 11
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published 3 days ago • 21
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published 4 days ago • 28
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published 6 days ago • 48
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published 8 days ago • 26
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published 5 days ago • 27
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published 5 days ago • 43
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published 8 days ago • 20
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 10 days ago • 218
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 9 days ago • 34
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 9 days ago • 32
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 10 days ago • 92
Game4Loc: A UAV Geo-Localization Benchmark from Game Data Paper • 2409.16925 • Published 10 days ago • 6
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 10 days ago • 327
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published 12 days ago • 24
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 13 days ago • 27
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning Paper • 2409.14674 • Published 13 days ago • 40
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published 12 days ago • 34
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 16 days ago • 66
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 16 days ago • 128
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published 16 days ago • 35
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 17 days ago • 46
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 17 days ago • 35
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 17 days ago • 69
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper • 2409.09269 • Published 22 days ago • 7
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published 23 days ago • 10
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 22 days ago • 30
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning Paper • 2406.12050 • Published Jun 17 • 17
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 24 days ago • 63
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 23 days ago • 15
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 30 days ago • 41
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 23 days ago • 42
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis Paper • 2409.07129 • Published 25 days ago • 6
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published 24 days ago • 50
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published 25 days ago • 19
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 25 days ago • 54
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published 28 days ago • 22
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published 29 days ago • 20