Gmc2 (Maozhou Ge)

upvoted a paper 18 days ago

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

Paper • 2408.14158 • Published Aug 26 • 2

upvoted a paper 24 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 26 days ago • 54

upvoted 2 papers about 2 months ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 40

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 154

upvoted a paper 2 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 103

upvoted 5 papers 3 months ago

LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

Paper • 2406.18485 • Published Jun 26 • 2

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Paper • 2406.14909 • Published Jun 21 • 13

upvoted a collection 3 months ago

LLM Compiler

Collection

Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated Jun 27 • 147

upvoted 5 papers 3 months ago

Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24 • 67

A Closer Look into Mixture-of-Experts in Large Language Models

Paper • 2406.18219 • Published Jun 26 • 15

Unlocking Continual Learning Abilities in Language Models

Paper • 2406.17245 • Published Jun 25 • 28

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24 • 32

Scaling Laws for Linear Complexity Language Models

Paper • 2406.16690 • Published Jun 24 • 22

upvoted 10 papers 4 months ago

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11 • 52

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

Paper • 2406.06563 • Published Jun 3 • 17

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

Training-Free Long-Context Scaling of Large Language Models

Paper • 2402.17463 • Published Feb 27 • 19

Zero Bubble Pipeline Parallelism

Paper • 2401.10241 • Published Nov 30, 2023 • 22

2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published May 28 • 23

Representation Engineering: A Top-Down Approach to AI Transparency

Paper • 2310.01405 • Published Oct 2, 2023 • 5

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 59

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

upvoted 4 papers 5 months ago

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

A Unified Sequence Parallelism Approach for Long Context Generative AI

Paper • 2405.07719 • Published May 13 • 2

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

upvoted 2 articles 5 months ago

Article

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Dec 11, 2023

• 9

Article

Mixture of Experts Explained

Dec 11, 2023

• 162

upvoted 6 papers 5 months ago

Generating Long Sequences with Sparse Transformers

Paper • 1904.10509 • Published Apr 23, 2019 • 1

Longformer: The Long-Document Transformer

Paper • 2004.05150 • Published Apr 10, 2020 • 3

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Paper • 2405.04437 • Published May 7 • 3

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Paper • 2306.12929 • Published Jun 22, 2023 • 12

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124

upvoted a collection 6 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 11 days ago • 676

upvoted 2 papers 6 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103

upvoted an article 6 months ago

Article

Llama 2 is here - get it on Hugging Face

Jul 18, 2023

• 20

upvoted a paper 6 months ago

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9 • 21

upvoted an article 6 months ago

Article

The Technology Behind BLOOM Training

Jul 14, 2022

• 16

upvoted 7 papers 6 months ago

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 56

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 25

Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 34

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104

Rethinking Memory and Communication Cost for Efficient Large Language Model Training

Paper • 2310.06003 • Published Oct 9, 2023 • 2

Optimized Network Architectures for Large Language Model Training with Billions of Parameters

Paper • 2307.12169 • Published Jul 22, 2023 • 9

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157

upvoted 9 papers 7 months ago

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 32

Unicron: Economizing Self-Healing LLM Training at Scale

Paper • 2401.00134 • Published Dec 30, 2023 • 9

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 111

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

Paper • 2403.09919 • Published Mar 14 • 20

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 26

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Paper • 2311.12351 • Published Nov 21, 2023 • 3

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5 • 14

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Paper • 2402.02244 • Published Feb 3 • 1

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14 • 20

Maozhou Ge

AI & ML interests

Organizations

Gmc2's activity

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Mixture of Experts Explained

Llama 2 is here - get it on Hugging Face

The Technology Behind BLOOM Training