Game4Loc: A UAV Geo-Localization Benchmark from Game Data Paper • 2409.16925 • Published 10 days ago • 6
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models Paper • 2409.16493 • Published 11 days ago • 7
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale Paper • 2409.16299 • Published 26 days ago • 9
Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing Paper • 2409.16629 • Published 11 days ago • 9
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Paper • 2409.17058 • Published 10 days ago • 9
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published 10 days ago • 11
AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark Paper • 2409.15041 • Published 12 days ago • 12
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 10 days ago • 58
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 10 days ago • 92
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published 14 days ago • 10
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 10 days ago • 22
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 9 days ago • 23
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 9 days ago • 34
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 9 days ago • 32
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 10 days ago • 43
LML: Language Model Learning a Dataset for Data-Augmented Prediction Paper • 2409.18957 • Published 8 days ago • 8
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making Paper • 2409.16686 • Published 11 days ago • 7
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows Paper • 2409.17433 • Published 10 days ago • 8
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published 8 days ago • 22
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published 8 days ago • 20
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult Paper • 2409.17545 • Published 10 days ago • 16
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 10 days ago • 22
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Paper • 2409.12139 • Published 17 days ago • 11
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 17 days ago • 35
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 18 days ago • 18
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 17 days ago • 69
Can OOD Object Detectors Learn from Foundation Models? Paper • 2409.05162 • Published 27 days ago • 6
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published 25 days ago • 11
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published 23 days ago • 9
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published 23 days ago • 10
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder Paper • 2409.08248 • Published 23 days ago • 12
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published 23 days ago • 15
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 23 days ago • 15
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 30 days ago • 41
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 23 days ago • 42
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 24 days ago • 63
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation Paper • 2409.06703 • Published 25 days ago • 2
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding Paper • 2409.06210 • Published 26 days ago • 24
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 25 days ago • 54
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published 25 days ago • 37
Evaluating Multiview Object Consistency in Humans and Image Models Paper • 2409.05862 • Published 26 days ago • 8
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper • 2409.05177 • Published 27 days ago • 5
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Paper • 2409.04269 • Published 29 days ago • 9
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Paper • 2409.05865 • Published 26 days ago • 14
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published 28 days ago • 22
Benchmarking Chinese Knowledge Rectification in Large Language Models Paper • 2409.05806 • Published 26 days ago • 14
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published 29 days ago • 20
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published 27 days ago • 26
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs Paper • 2409.05152 • Published 27 days ago • 29
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 26 days ago • 45