fimbulvntr commited on
Commit
e557c95
1 Parent(s): df8511d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Original model: https://huggingface.co/brucethemoose/Yi-34B-200K-RPMerge
2
+
3
+ Steps:
4
+ 1. Convert to GGUF using llama.cpp (clone from source, install requirements, then run this)
5
+ > `python convert.py /mnt/d/LLM_Models/Yi-34B-200K-RPMerge/ --vocab-type hfft --outtype f32 --outfile Yi-34B-200K-RPMerge.gguf`
6
+ 2. Create imatrix (offload as much as you can to the GPU)
7
+ > `./imatrix -m /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.gguf -f /mnt/d/LLM_Models/8k_random_data.txt -o /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.imatrix.dat -ngl 20`
8
+ 3. Quantize using imatrix
9
+ > `./quantize --imatrix /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.imatrix.dat /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.gguf /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.IQ2_XXS.gguf IQ2_XXS
10
+
11
+ I have also uploaded [8k_random_data.txt from this github discussion](https://github.com/ggerganov/llama.cpp/discussions/5006)
12
+ And the importance matrix I made (`Yi-34B-200K-RPMerge.imatrix.dat`)