Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 10 days ago • 92
Seeing Faces in Things: A Model and Dataset for Pareidolia Paper • 2409.16143 • Published 11 days ago • 15
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 9 days ago • 23
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published 15 days ago • 12
gsplat: An Open-Source Library for Gaussian Splatting Paper • 2409.06765 • Published 25 days ago • 11
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published Sep 3 • 33
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published Jun 6 • 10
Memory Consolidation Enables Long-Context Video Understanding Paper • 2402.05861 • Published Feb 8 • 8
Boximator: Generating Rich and Controllable Motions for Video Synthesis Paper • 2402.01566 • Published Feb 2 • 26