ikawrakow commited on
Commit
5ed7160
1 Parent(s): 4acc359

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ This repository contains alternative Open-Hermes-2.5-Mistral-7B (https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) quantized models in GGUF format for use with `llama.cpp`.
6
+ The models are fully compatible with the oficial `llama.cpp` release and can be used out-of-the-box.
7
+
8
+ I'm carefull to say "alternative" rather than "better" or "improved" as I have not put any effort into evaluating performance
9
+ differences in actual usage. Perplexity is lower compared to the "official" `llama.cpp` quantization (e.g., as provided by https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF),
10
+ but perplexity is not necessarily a good measure for real world performance. Nevertheless, perplexity does measure quantization error, so below is a table
11
+ comparing perplexities of these quantized models to the current `llama.cpp` quantization approach on Wikitext for a context length of 512 tokens.
12
+ The "Quantization Error" columns in the table are defined as `(PPL(quantized model) - PPL(fp16))/PPL(fp16)`.
13
+
14
+ | Quantization | Model file | PPL(llama.cpp) | Quantization Error | PPL(new quants) | Quantization Error |
15
+ |--:|--:|--:|--:|--:|--:|
16
+ |Q3_K_S| oh-2.5-m7b-q3k-small.gguf | 6.8943 | 7.30% | 6.7228 | 4.63% |
17
+ |Q3_K_M| oh-2.5-m7b-q3k-medium.gguf| 6.7366 | 4.84% | 6.5899 | 2.56% |
18
+ |Q4_K_S| oh-2.5-m7b-q4k-small.gguf | 6.5720 | 2.28% | 6.4778 | 0.82% |
19
+ |Q4_K_M| oh-2.5-m7b-q4k-medium.gguf| 6.5322 | 1.66% | 6.4740 | 0.76% |
20
+ |Q5_K_S| oh-2.5-m7b-q5k-small.gguf | 6.4668 | 0.64% | 6.4428 | 0.27% |
21
+ |Q5_K_M| oh-2.5-m7b-q5k-medium.gguf| 6.4536 | 0.44% | 6.4422 | 0.26% |
22
+ |Q4_0 | oh-2.5-m7b-q40.gguf | 6.5443 | 1.85% | 6.5454 | 1.87% |
23
+ |Q4_1 | oh-2.5-m7b-q41.gguf | 6.6246 | 3.10% | 6.4810 | 0.87% |
24
+ |Q5_0 | oh-2.5-m7b-q50.gguf | 6.4731 | 0.74% | 6.4554 | 0.47% |
25
+ |Q5_1 | oh-2.5-m7b-q51.gguf | 6.4818 | 0.88% | 6.4390 | 0.21% |
26
+
27
+ The figure is a plot of the data in the above table, where the x-axis is the quantized model size in GiB.
28
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6570493c84632c95659d342a/8k55U0ySDmOclF3QJjv3X.png)
29
+