davidxmle commited on
Commit
5202788
1 Parent(s): ed3a938

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -53,6 +53,15 @@ datasets:
53
  - Built with Meta Llama 3
54
  - Quantized by [Astronomer](https://astronomer.io)
55
 
 
 
 
 
 
 
 
 
 
56
  <!-- description start -->
57
  ## Description
58
 
 
53
  - Built with Meta Llama 3
54
  - Quantized by [Astronomer](https://astronomer.io)
55
 
56
+ # Important Note About Serving with vLLM & oobabooga/text-generation-webui
57
+ - For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
58
+ - vLLM does not yet respect `generation_config.json`.
59
+ - vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
60
+ - For oobabooga/text-generation-webui
61
+ - Load the model via AutoGPTQ, with `no_inject_fused_attention` enabled. This is a bug with AutoGPTQ library.
62
+ - Under `Parameters` -> `Generation` -> `Skip special tokens`: turn this off (deselect)
63
+ - Under `Parameters` -> `Generation` -> `Custom stopping strings`: add `"<|end_of_text|>","<|eot_id|>"` to the field
64
+
65
  <!-- description start -->
66
  ## Description
67