Xenova HF staff commited on
Commit
c15b810
1 Parent(s): c4f9132

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -79,7 +79,7 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
79
  In order to run the inference with Llama 3.1 405B Instruct AWQ in INT4, both `torch` and `autoawq` need to be installed as:
80
 
81
  ```bash
82
- pip install "torch>=2.2.0,<2.3.0" autoawq --upgrade
83
  ```
84
 
85
  Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
@@ -107,7 +107,6 @@ model = AutoAWQForCausalLM.from_pretrained(
107
  torch_dtype=torch.float16,
108
  low_cpu_mem_usage=True,
109
  device_map="auto",
110
- fuse_layers=True,
111
  )
112
 
113
  inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to('cuda')
 
79
  In order to run the inference with Llama 3.1 405B Instruct AWQ in INT4, both `torch` and `autoawq` need to be installed as:
80
 
81
  ```bash
82
+ pip install "torch>=2.2.0,<2.3.0" torchvision autoawq --upgrade
83
  ```
84
 
85
  Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
 
107
  torch_dtype=torch.float16,
108
  low_cpu_mem_usage=True,
109
  device_map="auto",
 
110
  )
111
 
112
  inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to('cuda')