Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning Paper • 2408.14158 • Published Aug 26 • 2
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 26 days ago • 54
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 40
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism Paper • 2406.18485 • Published Jun 26 • 2
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21 • 13
LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated Jun 27 • 147
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26 • 15
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published Jun 3 • 17
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 19
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 5
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 28
A Unified Sequence Parallelism Approach for Long Context Generative AI Paper • 2405.07719 • Published May 13 • 2
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14 • 27
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Paper • 2405.04437 • Published May 7 • 3
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Paper • 2306.12929 • Published Jun 22, 2023 • 12
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 11 days ago • 676
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 21
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 25
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
Rethinking Memory and Communication Cost for Efficient Large Language Model Training Paper • 2310.06003 • Published Oct 9, 2023 • 2
Optimized Network Architectures for Large Language Model Training with Billions of Parameters Paper • 2307.12169 • Published Jul 22, 2023 • 9
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21 • 32
Unicron: Economizing Self-Healing LLM Training at Scale Paper • 2401.00134 • Published Dec 30, 2023 • 9
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 111
Recurrent Drafter for Fast Speculative Decoding in Large Language Models Paper • 2403.09919 • Published Mar 14 • 20
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 26
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Paper • 2311.12351 • Published Nov 21, 2023 • 3
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5 • 14
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models Paper • 2402.02244 • Published Feb 3 • 1
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20