matichon commited on
Commit
e2dde48
1 Parent(s): 5d3f32f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -70,6 +70,9 @@ Give me a detailed list of the attractions I should visit, and time it takes in
70
  | quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvint8-gs64 | 64 | int8 | 1 | 8 | int4 | fp16 |
71
 
72
 
 
 
 
73
  ## Quantize script
74
  ```
75
  python ../quantization/quantize.py --model_dir /root/.cache/huggingface/hub/models--casperhansen--llama-3-70b-fp16/snapshots/c8647dcc2296eb8d763645645ebda784da16141a --dtype float16 --qformat int4_awq --awq_block_size 64 --output_dir ./quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvfp16-gs64 --batch_size 8 --tp_size 8 --pp_size 1 --calib_size 512
 
70
  | quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvint8-gs64 | 64 | int8 | 1 | 8 | int4 | fp16 |
71
 
72
 
73
+ ## TRT-LLM and AMMO
74
+ - TRT-LLM rel 0.9 a9356d4b7610330e89c1010f342a9ac644215c52
75
+
76
  ## Quantize script
77
  ```
78
  python ../quantization/quantize.py --model_dir /root/.cache/huggingface/hub/models--casperhansen--llama-3-70b-fp16/snapshots/c8647dcc2296eb8d763645645ebda784da16141a --dtype float16 --qformat int4_awq --awq_block_size 64 --output_dir ./quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvfp16-gs64 --batch_size 8 --tp_size 8 --pp_size 1 --calib_size 512