TheBloke commited on
Commit
c27b183
1 Parent(s): 7ca8a6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -1,7 +1,48 @@
1
  ---
2
  license: other
 
3
  ---
4
 
 
 
5
  Quantised 4bit and 2bit GGMLs of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
6
 
7
- More details coming soon.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ inference: false
4
  ---
5
 
6
+ # Quantised GGMLs of alpaca-lora-65B
7
+
8
  Quantised 4bit and 2bit GGMLs of [changsung's alpaca-lora-65B](https://huggingface.co/chansung/alpaca-lora-65b)
9
 
10
+ ## Provided files
11
+
12
+ This repository contains two model files:
13
+ * 4bit - 39GB - `alpaca-lora-65B.GGML.q4_0.bin`
14
+ * 2bit - 23GB - `alpaca-lora-65B.GGML.q2_0.bin`
15
+
16
+ ## Creation method, and requirements
17
+
18
+ ### 4bit q4_0
19
+
20
+ This file was created using the new q4_0 quantisation method being trialled in [llama.cpp PR 896](https://github.com/ggerganov/llama.cpp/pull/896)
21
+
22
+ At the time of writing, this code has not yet been merged into the main [llama.cpp repo](https://github.com/ggerganov/llama.cpp) but it is likely to be merged soon.
23
+
24
+ This quantisation method is 4bit, but uses a new method to achieve a higher quality/lower perplexity than the previous method.
25
+
26
+ You can run inference on this model using any recent version of `llama.cpp` - you do not need to use the code from PR 896 specifically.
27
+
28
+ ### 2bit q2_0
29
+
30
+ This file was created using an even newer and more experimental 2bit method being trialled in [llama.cpp PR 1004](https://github.com/ggerganov/llama.cpp/pull/1004).
31
+
32
+ Again, this code is not yet merged into the main `llama.cpp` repo.
33
+
34
+ And, unlike the 4bit file, to run this file you DO need to compile and run the same `llama.cpp` code that was used to create it.
35
+
36
+ To checkout this code and compile this version, do the following:
37
+ ```
38
+ git clone https://github.com/sw/llama.cpp llama-q2q3
39
+ cd llama-q2q3
40
+ git checkout q2q3
41
+ make
42
+ ```
43
+
44
+ # Original model card not provided
45
+
46
+ No model card was provided in [changsung's original repository](https://huggingface.co/chansung/alpaca-lora-65b).
47
+
48
+ Based on the name, I assume this is the result of fine tuning using the original GPT 3.5 Alpaca dataset. It is unknown as to whether the original Stanford data was used, or the [cleaned tloen/alpaca-lora variant](https://github.com/tloen/alpaca-lora).