bhenrym14
/

airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 14, 2023

Commit

8c0ae29

•

1 Parent(s): 5287dd9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ Pretraining took 10 hours. Fine-tuning took ~41 hours on 1x RTX 6000 Ada.
 The easiest way is to use the GPTQ weights (linked above) with [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) and ExLlama. You'll need to set max_seq_len to 16384 and compress_pos_emb to 8.
-**IMPORTANT: To use these weights with HF transformers you'll need to patch in the appropriate RoPE scaling module. see: [replace_llama_rope_with_scaled_rope](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_rope_scaled_monkey_patch.py)**
 I have had issues with going beyond 8192 tokens with exllama. I have not tested that with this model. YMMV

 The easiest way is to use the GPTQ weights (linked above) with [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) and ExLlama. You'll need to set max_seq_len to 16384 and compress_pos_emb to 8.
+**IMPORTANT: To use these weights with autoGPTQ or GPTQ-for-LLama you'll need to patch in the appropriate RoPE scaling module. see: [replace_llama_rope_with_scaled_rope](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_rope_scaled_monkey_patch-16k.py)**
 I have had issues with going beyond 8192 tokens with exllama. I have not tested that with this model. YMMV