MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published 11 days ago • 30
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published 16 days ago • 20
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 23 days ago • 14
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 23 days ago • 30
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation Paper • 2409.01055 • Published Sep 2 • 6
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Paper • 2408.04567 • Published Aug 8 • 23
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks Paper • 2408.03615 • Published Aug 7 • 30
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6 • 36
ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation Paper • 2408.02226 • Published Aug 5 • 10
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published Aug 5 • 28
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1 • 41
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement Paper • 2408.00653 • Published Aug 1 • 27
ShieldGemma: Generative AI Content Moderation Based on Gemma Paper • 2407.21772 • Published Jul 31 • 13
Improving 2D Feature Representations by 3D-Aware Fine-Tuning Paper • 2407.20229 • Published Jul 29 • 7
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Paper • 2407.17438 • Published Jul 24 • 23
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 67
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Paper • 2407.16655 • Published Jul 23 • 27
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 76
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation Paper • 2407.11394 • Published Jul 16 • 11
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 125
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models Paper • 2407.08701 • Published Jul 11 • 10
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper • 2407.08683 • Published Jul 11 • 22
Still-Moving: Customized Video Generation without Customized Video Data Paper • 2407.08674 • Published Jul 11 • 12
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models Paper • 2407.06938 • Published Jul 9 • 21
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3 • 27
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper • 2407.05282 • Published Jul 7 • 12
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents Paper • 2407.04363 • Published Jul 5 • 26
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published Jun 26 • 18
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality Paper • 2406.18462 • Published Jun 26 • 11
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Paper • 2406.20085 • Published Jun 28 • 9
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
MatchTime: Towards Automatic Soccer Game Commentary Generation Paper • 2406.18530 • Published Jun 26 • 12
MotionBooth: Motion-Aware Customized Text-to-Video Generation Paper • 2406.17758 • Published Jun 25 • 18
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 85
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals Paper • 2406.16273 • Published Jun 24 • 40
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22 • 14
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published Jun 19 • 16
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21 • 60
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15 • 65
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 56
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published Jun 14 • 32