davidxmle commited on
Commit
2f40088
1 Parent(s): e467c88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -19
README.md CHANGED
@@ -5,17 +5,7 @@ model_creator: astronomer-io
5
  model_name: Meta-Llama-3-8B-Instruct
6
  model_type: llama
7
  pipeline_tag: text-generation
8
- prompt_template: >-
9
- {% set loop_messages = messages %}{% for message in loop_messages %}{% set
10
- content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
11
-
12
-
13
- '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set
14
- content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if
15
- add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
16
-
17
-
18
- ' }}{% endif %}
19
  quantized_by: davidxmle
20
  license: other
21
  license_name: llama-3-community-license
@@ -46,14 +36,6 @@ datasets:
46
  <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">Astronomer is the de facto company for <a href="https://airflow.apache.org/">Apache Airflow</a>, the most trusted open-source framework for data orchestration and MLOps.</p></div>
47
  <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
48
  <!-- header end -->
49
-
50
- # Important Note Regarding a Known Bug in Llama 3
51
- - Two files are modified to address a current issue regarding Llama 3 models keep on generating additional tokens non-stop until hitting max token limit.
52
- - `generation_config.json`'s `eos_token_id` have been modified to add the other EOS token that Llama-3 uses.
53
- - `tokenizer_config.json`'s `chat_template` has been modified to only add start generation token at the end of a prompt if `add_generation_prompt` is selected.
54
- - For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
55
- - vLLM does not yet respect `generation_config.json`.
56
- - vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
57
 
58
  # Llama-3-8B-Instruct-GPTQ-8-Bit
59
  - Original Model creator: [Meta Llama from Meta](https://huggingface.co/meta-llama)
@@ -61,6 +43,15 @@ datasets:
61
  - Built with Meta Llama 3
62
  - Quantized by [Astronomer](https://astronomer.io)
63
 
 
 
 
 
 
 
 
 
 
64
  <!-- description start -->
65
  ## Description
66
 
 
5
  model_name: Meta-Llama-3-8B-Instruct
6
  model_type: llama
7
  pipeline_tag: text-generation
8
+ prompt_template: "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
 
 
 
 
 
 
 
 
 
 
9
  quantized_by: davidxmle
10
  license: other
11
  license_name: llama-3-community-license
 
36
  <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">Astronomer is the de facto company for <a href="https://airflow.apache.org/">Apache Airflow</a>, the most trusted open-source framework for data orchestration and MLOps.</p></div>
37
  <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
38
  <!-- header end -->
 
 
 
 
 
 
 
 
39
 
40
  # Llama-3-8B-Instruct-GPTQ-8-Bit
41
  - Original Model creator: [Meta Llama from Meta](https://huggingface.co/meta-llama)
 
43
  - Built with Meta Llama 3
44
  - Quantized by [Astronomer](https://astronomer.io)
45
 
46
+ # Important Note About Serving with vLLM & oobabooga/text-generation-webui
47
+ - For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
48
+ - vLLM does not yet respect `generation_config.json`.
49
+ - vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
50
+ - For oobabooga/text-generation-webui
51
+ - Load the model via AutoGPTQ, with `no_inject_fused_attention` enabled. This is a bug with AutoGPTQ library.
52
+ - Under `Parameters` -> `Generation` -> `Skip special tokens`: turn this off (deselect)
53
+ - Under `Parameters` -> `Generation` -> `Custom stopping strings`: add `"<|end_of_text|>","<|eot_id|>"` to the field
54
+
55
  <!-- description start -->
56
  ## Description
57