FP8 LLMs for vLLM - a neuralmagic Collection

neuralmagic 's Collections

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

FP8 LLMs for vLLM

updated 8 days ago

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!

neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8

Text Generation • Updated Aug 22 • 1.45k • 28
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • Updated Aug 23 • 12.9k • 27
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • Updated Aug 23 • 66.4k • 28
neuralmagic/Phi-3-medium-128k-instruct-FP8

Text Generation • Updated Aug 12 • 34.8k • 5
neuralmagic/Mistral-Nemo-Instruct-2407-FP8

Text Generation • Updated Jul 19 • 2.24k • 13
neuralmagic/Meta-Llama-3-8B-Instruct-FP8

Text Generation • Updated Jul 18 • 12.9k • 18
neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic

Text Generation • Updated Aug 22 • 304 • 13
neuralmagic/Meta-Llama-3-70B-Instruct-FP8

Text Generation • Updated Jul 18 • 5.64k • 10
neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8

Text Generation • Updated Jul 18 • 1.34k • 2
neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV

Text Generation • Updated Jun 19 • 20.4k • 6
neuralmagic/Meta-Llama-3-70B-Instruct-FP8-KV

Text Generation • Updated Jun 26 • 251 • 2
neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8

Text Generation • Updated Aug 12 • 497 • 1
neuralmagic/Qwen2-72B-Instruct-FP8

Text Generation • Updated Jul 18 • 1.07k • 9
neuralmagic/Qwen2-7B-Instruct-FP8

Text Generation • Updated Jul 18 • 644 • 1
neuralmagic/Qwen2-1.5B-Instruct-FP8

Text Generation • Updated Jul 18 • 94
neuralmagic/Qwen2-0.5B-Instruct-FP8

Text Generation • Updated Jul 18 • 372 • 2
neuralmagic/Mistral-7B-Instruct-v0.3-FP8

Text Generation • Updated Jul 18 • 651 • 2
neuralmagic/Llama-2-7b-chat-hf-FP8

Text Generation • Updated Jul 18 • 309
neuralmagic/Phi-3-mini-128k-instruct-FP8

Text Generation • Updated Aug 12 • 293
neuralmagic/gemma-2-9b-it-FP8

Text Generation • Updated Jul 18 • 1.37k • 5
neuralmagic/Qwen2-57B-A14B-Instruct-FP8

Text Generation • Updated Jul 18 • 360 • 1
neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8

Text Generation • Updated Jul 18 • 2.9k • 4
neuralmagic/DeepSeek-Coder-V2-Lite-Base-FP8

Text Generation • Updated Jul 18 • 113
neuralmagic/DeepSeek-Coder-V2-Base-FP8

Text Generation • Updated Jul 22 • 11
neuralmagic/DeepSeek-Coder-V2-Instruct-FP8

Text Generation • Updated Jul 22 • 3.17k • 6
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic

Text Generation • Updated Aug 23 • 8.01k • 5
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic

Text Generation • Updated Aug 23 • 1.97k • 2
neuralmagic/Meta-Llama-3.1-8B-FP8

Text Generation • Updated Aug 13 • 1.63k • 5
neuralmagic/Meta-Llama-3.1-70B-FP8

Text Generation • Updated Aug 13 • 309
neuralmagic/starcoder2-15b-FP8

Text Generation • Updated Aug 1 • 52
neuralmagic/starcoder2-3b-FP8

Text Generation • Updated Aug 1 • 31
neuralmagic/starcoder2-7b-FP8

Text Generation • Updated Aug 1 • 8
neuralmagic/Meta-Llama-3.1-405B-FP8

Text Generation • Updated Aug 13 • 33
neuralmagic/gemma-2-2b-it-FP8

Updated Aug 13 • 386 • 1
neuralmagic/Llama-3.2-1B-Instruct-FP8-dynamic

Text Generation • Updated 10 days ago • 436 • 1
neuralmagic/Llama-3.2-3B-Instruct-FP8-dynamic

Text Generation • Updated 10 days ago • 211 • 1
neuralmagic/Llama-3.2-3B-Instruct-FP8

Text Generation • Updated 9 days ago • 3.31k
neuralmagic/Llama-3.2-1B-Instruct-FP8

Text Generation • Updated 8 days ago • 213
neuralmagic/Llama-3.2-1B-FP8

Updated 9 days ago • 156
neuralmagic/Phi-3.5-mini-instruct-FP8-KV

Text Generation • Updated 4 days ago • 157 • 1