astronomer
/

Llama-3-8B-Instruct-GPTQ-8-Bit

@@ -5,17 +5,7 @@ model_creator: astronomer-io
 model_name: Meta-Llama-3-8B-Instruct
 model_type: llama
 pipeline_tag: text-generation
-prompt_template: >-
-  {% set loop_messages = messages %}{% for message in loop_messages %}{% set
-  content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
-  '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set
-  content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if
-  add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
-  ' }}{% endif %}
 quantized_by: davidxmle
 license: other
 license_name: llama-3-community-license
@@ -46,14 +36,6 @@ datasets:
 <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">Astronomer is the de facto company for <a href="https://airflow.apache.org/">Apache Airflow</a>, the most trusted open-source framework for data orchestration and MLOps.</p></div>
 <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
 <!-- header end -->
-# Important Note Regarding a Known Bug in Llama 3
-- Two files are modified to address a current issue regarding Llama 3 models keep on generating additional tokens non-stop until hitting max token limit.
-- `generation_config.json`'s `eos_token_id` have been modified to add the other EOS token that Llama-3 uses.
-- `tokenizer_config.json`'s `chat_template` has been modified to only add start generation token at the end of a prompt if `add_generation_prompt` is selected.
-- For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
-   - vLLM does not yet respect `generation_config.json`.
-   - vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
 # Llama-3-8B-Instruct-GPTQ-8-Bit
 - Original Model creator: [Meta Llama from Meta](https://huggingface.co/meta-llama)
@@ -61,6 +43,15 @@ datasets:
 - Built with Meta Llama 3
 - Quantized by [Astronomer](https://astronomer.io)
 <!-- description start -->
 ## Description

 model_name: Meta-Llama-3-8B-Instruct
 model_type: llama
 pipeline_tag: text-generation
+prompt_template: "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
 quantized_by: davidxmle
 license: other
 license_name: llama-3-community-license
 <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">Astronomer is the de facto company for <a href="https://airflow.apache.org/">Apache Airflow</a>, the most trusted open-source framework for data orchestration and MLOps.</p></div>
 <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
 <!-- header end -->
 # Llama-3-8B-Instruct-GPTQ-8-Bit
 - Original Model creator: [Meta Llama from Meta](https://huggingface.co/meta-llama)
 - Built with Meta Llama 3
 - Quantized by [Astronomer](https://astronomer.io)
+# Important Note About Serving with vLLM & oobabooga/text-generation-webui
+- For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
+   - vLLM does not yet respect `generation_config.json`.
+   - vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
+- For oobabooga/text-generation-webui
+  - Load the model via AutoGPTQ, with `no_inject_fused_attention` enabled. This is a bug with AutoGPTQ library.
+  - Under `Parameters` -> `Generation` -> `Skip special tokens`: turn this off (deselect)
+  - Under `Parameters` -> `Generation` -> `Custom stopping strings`: add `"<|end_of_text|>","<|eot_id|>"` to the field
 <!-- description start -->
 ## Description