Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4 • 85
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 32
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Paper • 2408.00458 • Published Aug 1 • 10
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1 • 9
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 67
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24 • 55
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning Paper • 2406.14283 • Published Jun 20 • 2
TroL: Traversal of Layers for Large Language and Vision Models Paper • 2406.12246 • Published Jun 18 • 34
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation Paper • 2401.11504 • Published Jan 21 • 1
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published Jun 14 • 21
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13 • 50
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 64
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper • 2405.10637 • Published May 17 • 19
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24 • 26
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 63
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 49
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 56
Beyond Surface: Probing LLaMA Across Scales and Layers Paper • 2312.04333 • Published Dec 7, 2023 • 18
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Paper • 2312.01552 • Published Dec 4, 2023 • 30
FLM-101B: An Open LLM and How to Train It with $100K Budget Paper • 2309.03852 • Published Sep 7, 2023 • 43
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 240
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170