RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper • 2406.02523 • Published Jun 4 • 9 • 1
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published Jun 4 • 8 • 2
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published Jun 4 • 15 • 3
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4 • 29 • 2
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 7 • 1
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM Paper • 2406.02884 • Published Jun 5 • 14 • 2
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published Jun 5 • 17 • 1
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published Jun 3 • 30 • 2
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 36 • 1
Open-Endedness is Essential for Artificial Superhuman Intelligence Paper • 2406.04268 • Published Jun 6 • 11 • 1
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6 • 17 • 1
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published Jun 6 • 22 • 1
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Paper • 2406.04271 • Published Jun 6 • 27 • 1
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6 • 26 • 2
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published Jun 6 • 36 • 3
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 71 • 4
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold Paper • 2305.10973 • Published May 18, 2023 • 31 • 74
Tree of Thoughts: Deliberate Problem Solving with Large Language Models Paper • 2305.10601 • Published May 17, 2023 • 10 • 1
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Paper • 2310.11511 • Published Oct 17, 2023 • 74 • 6
Do Large Language Models Latently Perform Multi-Hop Reasoning? Paper • 2402.16837 • Published Feb 26 • 24 • 1
CroissantLLM: A Truly Bilingual French-English Language Model Paper • 2402.00786 • Published Feb 1 • 25 • 3
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper • 2310.08588 • Published Oct 12, 2023 • 34 • 4
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23 • 35 • 1
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23 • 33 • 2
DiffusionGPT: LLM-Driven Text-to-Image Generation System Paper • 2401.10061 • Published Jan 18 • 27 • 4
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 33 • 3
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35 • 3
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 28 • 3
LMDX: Language Model-based Document Information Extraction and Localization Paper • 2309.10952 • Published Sep 19, 2023 • 65 • 21
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88 • 5
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction Paper • 2403.18795 • Published Mar 27 • 17 • 2
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 29 • 2
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Paper • 2306.15687 • Published Jun 23, 2023 • 1
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26 • 4
MagiCapture: High-Resolution Multi-Concept Portrait Customization Paper • 2309.06895 • Published Sep 13, 2023 • 27 • 3
Exploration into Translation-Equivariant Image Quantization Paper • 2112.00384 • Published Dec 1, 2021 • 1
Muse: Text-To-Image Generation via Masked Generative Transformers Paper • 2301.00704 • Published Jan 2, 2023 • 1
Improving language models by retrieving from trillions of tokens Paper • 2112.04426 • Published Dec 8, 2021 • 1 • 1
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16 • 26 • 2
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 24 • 3
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training Paper • 2401.00849 • Published Jan 1 • 14 • 2
EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation Paper • 2205.14756 • Published May 29, 2022 • 1
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 35 • 3
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models Paper • 2307.02421 • Published Jul 5, 2023 • 34 • 5
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 29 • 2
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 49 • 4