File size: 4,329 Bytes
7b6c6b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
578910d
7b6c6b2
 
 
a8e18af
7b6c6b2
 
 
578910d
7b6c6b2
578910d
7b6c6b2
578910d
7b6c6b2
 
 
 
a8e18af
7b6c6b2
 
 
2463409
7b6c6b2
 
 
 
 
 
 
578910d
7b6c6b2
 
578910d
4f1fa10
 
7b6c6b2
578910d
7b6c6b2
802dcfc
 
 
064fe43
802dcfc
7b6c6b2
 
 
 
 
802dcfc
 
 
 
578910d
7b6c6b2
 
 
 
 
 
 
 
 
 
 
 
 
 
868c368
7b6c6b2
 
9e80949
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language:
- en
license: llama2
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
model_name: Llama 2 7B Chat
arxiv: 2307.09288
base_model: meta-llama/Llama-2-7b-chat-hf
inference: false
model_creator: Meta Llama 2
model_type: llama
pipeline_tag: text-generation
quantized_by: Second State Inc.
---

<!-- header start -->
<!-- 200823 -->
<div style="width: auto; margin-left: auto; margin-right: auto">
<img src="https://github.com/LlamaEdge/LlamaEdge/raw/dev/assets/logo.svg" style="width: 100%; min-width: 400px; display: block; margin: auto;">
</div>
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
<!-- header end -->

# Llama-2-7B-Chat-GGUF

## Original Model

[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)

## Run with LlamaEdge

- LlamaEdge version: [v0.2.8](https://github.com/LlamaEdge/LlamaEdge/releases/tag/0.2.8) and above

- Prompt template

  - Prompt type: `llama-2-chat`

  - Prompt string

    ```text
    <s>[INST] <<SYS>>
    {{ system_prompt }}
    <</SYS>>

    {{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }}   [/INST]
    ```

- Context size: `4096`

- Run as LlamaEdge service

  ```bash
  wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-2-7b-chat-hf-Q5_K_M.gguf \
    llama-api-server.wasm \
    --prompt-template llama-2-chat \
    --ctx-size 4096 \
    --model-name llama-2-7b-chat
  ```

- Run as LlamaEdge command app

  ```bash
  wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-2-7b-chat-hf-Q5_K_M.gguf \
    llama-chat.wasm \
    --prompt-template llama-2-chat \
    --ctx-size 4096
  ```

## Quantized GGUF Models

| Name | Quant method | Bits | Size | Use case |
| ---- | ---- | ---- | ---- | ----- |
| [Llama-2-7b-chat-hf-Q2_K.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q2_K.gguf)     | Q2_K   | 2 | 2.83 GB| smallest, significant quality loss - not recommended for most purposes |
| [Llama-2-7b-chat-hf-Q3_K_L.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q3_K_L.gguf) | Q3_K_L | 3 | 3.6 GB| small, substantial quality loss |
| [Llama-2-7b-chat-hf-Q3_K_M.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q3_K_M.gguf) | Q3_K_M | 3 | 3.3 GB| very small, high quality loss |
| [Llama-2-7b-chat-hf-Q3_K_S.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q3_K_S.gguf) | Q3_K_S | 3 | 2.95 GB| very small, high quality loss |
| [Llama-2-7b-chat-hf-Q4_0.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q4_0.gguf)     | Q4_0   | 4 | 3.83 GB| legacy; small, very high quality loss - prefer using Q3_K_M |
| [Llama-2-7b-chat-hf-Q4_K_M.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q4_K_M.gguf) | Q4_K_M | 4 | 4.08 GB| medium, balanced quality - recommended |
| [Llama-2-7b-chat-hf-Q4_K_S.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q4_K_S.gguf) | Q4_K_S | 4 | 3.86 GB| small, greater quality loss |
| [Llama-2-7b-chat-hf-Q5_0.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q5_0.gguf)     | Q5_0   | 5 | 4.65 GB| legacy; medium, balanced quality - prefer using Q4_K_M |
| [Llama-2-7b-chat-hf-Q5_K_M.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q5_K_M.gguf) | Q5_K_M | 5 | 4.78 GB| large, very low quality loss - recommended |
| [Llama-2-7b-chat-hf-Q5_K_S.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q5_K_S.gguf) | Q5_K_S | 5 | 4.65 GB| large, low quality loss - recommended |
| [Llama-2-7b-chat-hf-Q6_K.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q6_K.gguf)     | Q6_K   | 6 | 5.53 GB| very large, extremely low quality loss |
| [Llama-2-7b-chat-hf-Q8_0.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-Q8_0.gguf)     | Q8_0   | 8 | 7.16 GB| very large, extremely low quality loss - not recommended |
| [Llama-2-7b-chat-hf-f16.gguf](https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/blob/main/Llama-2-7b-chat-hf-f16.gguf)     | f16   | 16 | 13.5 GB|  |