second-state
/

Qwen2.5-3B-Instruct-GGUF

@@ -1,87 +1,38 @@
----
-base_model: Qwen/Qwen2.5-3B-Instruct
-license: other
-license_name: qwen-research
-license_link: https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSEE
-model_creator: Qwen
-model_name: QQwen2.5-3B-Instruct
-quantized_by: Second State Inc.
-language:
-- en
-pipeline_tag: text-generation
-tags:
-- chat
----
-<!-- header start -->
-<!-- 200823 -->
-<div style="width: auto; margin-left: auto; margin-right: auto">
-<img src="https://github.com/LlamaEdge/LlamaEdge/raw/dev/assets/logo.svg" style="width: 100%; min-width: 400px; display: block; margin: auto;">
-</div>
-<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
-<!-- header end -->
-# QQwen2.5-3B-Instruct-GGUF
-## Original Model
-[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
-## Run with LlamaEdge
-- LlamaEdge version: [v0.14.3](https://github.com/LlamaEdge/LlamaEdge/releases/tag/0.14.3) and above
-- Prompt template
-  - Prompt type: `chatml`
-  - Prompt string
-    ```text
-    <|im_start|>system
-    {system_message}<|im_end|>
-    <|im_start|>user
-    {prompt}<|im_end|>
-    <|im_start|>assistant
-    ```
-- Context size: `32000`
-- Run as LlamaEdge service
-  ```bash
-  wasmedge --dir .:. --nn-preload default:GGML:AUTO:QQwen2.5-3B-Instruct-Q5_K_M.gguf \
-    llama-api-server.wasm \
-    --model-name QQwen2.5-3B-Instruct \
-    --prompt-template chatml \
-    --ctx-size 32000
-  ```
-- Run as LlamaEdge command app
-  ```bash
-  wasmedge --dir .:. --nn-preload default:GGML:AUTO:QQwen2.5-3B-Instruct-Q5_K_M.gguf \
-    llama-chat.wasm \
-    --prompt-template chatml \
-    --ctx-size 32000
-  ```
-## Quantized GGUF Models
-| Name | Quant method | Bits | Size | Use case |
-| ---- | ---- | ---- | ---- | ----- |
-| [QQwen2.5-3B-Instruct-Q2_K.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q2_K.gguf)     | Q2_K   | 2 | 676 MB| smallest, significant quality loss - not recommended for most purposes |
-| [QQwen2.5-3B-Instruct-Q3_K_L.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q3_K_L.gguf) | Q3_K_L | 3 | 880 MB| small, substantial quality loss |
-| [QQwen2.5-3B-Instruct-Q3_K_M.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q3_K_M.gguf) | Q3_K_M | 3 | 824 MB| very small, high quality loss |
-| [QQwen2.5-3B-Instruct-Q3_K_S.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q3_K_S.gguf) | Q3_K_S | 3 | 761 MB| very small, high quality loss |
-| [QQwen2.5-3B-Instruct-Q4_0.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q4_0.gguf)     | Q4_0   | 4 | 935 MB| legacy; small, very high quality loss - prefer using Q3_K_M |
-| [QQwen2.5-3B-Instruct-Q4_K_M.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q4_K_M.gguf) | Q4_K_M | 4 | 986 MB| medium, balanced quality - recommended |
-| [QQwen2.5-3B-Instruct-Q4_K_S.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q4_K_S.gguf) | Q4_K_S | 4 | 940 MB| small, greater quality loss |
-| [QQwen2.5-3B-Instruct-Q5_0.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q5_0.gguf)     | Q5_0   | 5 | 1.1 GB| legacy; medium, balanced quality - prefer using Q4_K_M |
-| [QQwen2.5-3B-Instruct-Q5_K_M.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q5_K_M.gguf) | Q5_K_M | 5 | 1.13 GB| large, very low quality loss - recommended |
-| [QQwen2.5-3B-Instruct-Q5_K_S.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q5_K_S.gguf) | Q5_K_S | 5 | 1.1 GB| large, low quality loss - recommended |
-| [QQwen2.5-3B-Instruct-Q6_K.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q6_K.gguf)     | Q6_K   | 6 | 1.27 GB| very large, extremely low quality loss |
-| [QQwen2.5-3B-Instruct-Q8_0.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-Q8_0.gguf)     | Q8_0   | 8 | 1.65 GB| very large, extremely low quality loss - not recommended |
-| [QQwen2.5-3B-Instruct-f16.gguf](https://huggingface.co/second-state/QQwen2.5-3B-Instruct-GGUF/blob/main/QQwen2.5-3B-Instruct-f16.gguf)       | f16    | 16 | 3.09 GB| |
-*Quantized with llama.cpp b3751*

+# FACET
+## How to run
+```bash
+python main.py --full-name <hf-account-name/hf-model-name> -s <target-directory-to-save-git-clone> --enable-converter -c <path-to-python-convert-script> --enable-quantizer -q <path-to-llamacpp-quantizer> -t q4_0
+```
+For example,
+```bash
+python main.py --full-name baichuan-inc/Baichuan2-13B-Chat -s /home/ubuntu/workspace/models/ --enable-converter -c /home/ubuntu/workspace/llama.cpp/convert.py --enable-quantizer -q llama-cpp-quantizer -t Q5_K_M
+```
+## Allowed quantization types
+```text
+   2  or  Q4_0   :  3.56G, +0.2166 ppl @ LLaMA-v1-7B
+   3  or  Q4_1   :  3.90G, +0.1585 ppl @ LLaMA-v1-7B
+   8  or  Q5_0   :  4.33G, +0.0683 ppl @ LLaMA-v1-7B
+   9  or  Q5_1   :  4.70G, +0.0349 ppl @ LLaMA-v1-7B
+  10  or  Q2_K   :  2.63G, +0.6717 ppl @ LLaMA-v1-7B
+  12  or  Q3_K   : alias for Q3_K_M
+  11  or  Q3_K_S :  2.75G, +0.5551 ppl @ LLaMA-v1-7B
+  12  or  Q3_K_M :  3.07G, +0.2496 ppl @ LLaMA-v1-7B
+  13  or  Q3_K_L :  3.35G, +0.1764 ppl @ LLaMA-v1-7B
+  15  or  Q4_K   : alias for Q4_K_M
+  14  or  Q4_K_S :  3.59G, +0.0992 ppl @ LLaMA-v1-7B
+  15  or  Q4_K_M :  3.80G, +0.0532 ppl @ LLaMA-v1-7B
+  17  or  Q5_K   : alias for Q5_K_M
+  16  or  Q5_K_S :  4.33G, +0.0400 ppl @ LLaMA-v1-7B
+  17  or  Q5_K_M :  4.45G, +0.0122 ppl @ LLaMA-v1-7B
+  18  or  Q6_K   :  5.15G, -0.0008 ppl @ LLaMA-v1-7B
+   7  or  Q8_0   :  6.70G, +0.0004 ppl @ LLaMA-v1-7B
+   1  or  F16    : 13.00G              @ 7B
+   0  or  F32    : 26.00G              @ 7B
+          COPY   : only copy tensors, no quantizing
+```

main.py ADDED Viewed

	@@ -0,0 +1,247 @@

+import argparse
+import os
+import shutil
+import subprocess
+import time
+from pathlib import Path
+from git import Repo
+def clone_hf_with_git(username: str, model_name: str, saved_dir: str):
+    full_model_name = f"{username}/{model_name}"
+    url = f"https://huggingface.co/{full_model_name}"
+    saved = f"{saved_dir}/{model_name}"
+    # perform `git lfs install`
+    subprocess.run(["git", "lfs", "install"])
+    print(f"[INFO] Cloning {model_name} from {url} ...")
+    Repo.clone_from(url, saved)
+def download_hf_with_git(full_name: str, saved_dir: str):
+    model_name = full_name.split("/")[1]
+    url = f"git@hf.co:{full_name}"
+    saved = f"{saved_dir}/{model_name}"
+    # perform `git lfs install`
+    subprocess.run(["git", "lfs", "install"])
+    print(f"Cloning {model_name} from {url} ...")
+    subprocess.run(["git", "clone", "--progress", url, saved])
+def convert_hf_to_gguf(
+    script_path: str,
+    dir_raw_model: str,
+    gguf_model_path: str,
+    pad_vocab: bool = False,
+):
+    if pad_vocab is True:
+        args = [
+            "--outfile",
+            gguf_model_path,
+            # "--vocab-type",
+            # "bpe",
+            "--pad-vocab",
+            dir_raw_model,
+        ]
+    else:
+        args = ["--outfile", gguf_model_path, dir_raw_model]
+        # convert.py for llama-3
+        # args = ["--outfile", gguf_model_path, "--vocab-type", "bpe", dir_raw_model]
+    res = subprocess.run(["python", script_path] + args)
+    print(res)
+def quantize_model(
+    quantizer: str,
+    f16_gguf_model_path: str,
+    quantized_gguf_model_path: str,
+    quant_type: str = "q4_0",
+):
+    print(f"[INFO] quantizer: {quantizer}")
+    print(f"[INFO] quant_type: {quant_type}")
+    print(f"[INFO] f16_gguf_model_path: {f16_gguf_model_path}")
+    print(f"[INFO] quantized_model_filename: {quantized_gguf_model_path}")
+    subprocess.run(
+        [
+            quantizer,
+            f16_gguf_model_path,
+            quantized_gguf_model_path,
+            quant_type,
+        ]
+    )
+def main():
+    parser = argparse.ArgumentParser(description="Convert and quantize gguf models.")
+    parser.add_argument(
+        "--full-name",
+        type=str,
+        help="Huggingface model full name. e.g. `username/model_name`",
+    )
+    parser.add_argument(
+        "-s",
+        "--saved-dir",
+        type=str,
+        default="models",
+        help="The directory to save the model.",
+    )
+    parser.add_argument(
+        "--enable-converter",
+        action="store_true",
+        help="Enable the converter. Notice that `--converter` must be specified.",
+    )
+    parser.add_argument(
+        "-c",
+        "--converter",
+        type=str,
+        help="The path to the converter. Notice that `--enable-converter` must be specified if use this option.",
+    )
+    parser.add_argument(
+        "--pad-vocab",
+        action="store_true",
+        help="Enable adding pad tokens when model vocab expects more than tokenizer metadata provides. Notice that `--enable-converter` must be specified.",
+    )
+    parser.add_argument(
+        "--enable-quantizer",
+        action="store_true",
+        help="Enable the quantizer. Notice that `--quantizer` must be specified.",
+    )
+    parser.add_argument(
+        "-q",
+        "--quantizer",
+        type=str,
+        help="The path to the quantizer. Notice that `--enable-quantizer` must be specified if use this option.",
+    )
+    parser.add_argument(
+        "-t",
+        "--quant-type",
+        type=str,
+        default=None,
+        help="The quantization type. Notice that `--enable-quantizer` must be specified if use this option.",
+    )
+    args = parser.parse_args()
+    print(args)
+    print("Download model ...")
+    full_name = args.full_name
+    username, model_name = full_name.split("/")
+    saved_dir = args.saved_dir
+    # try:
+    #     download_hf_with_git(full_name, saved_dir)
+    #     print(f"The raw model is saved in {saved_dir}.")
+    # except Exception as e:
+    #     print(f"Failed to download model. {e}")
+    #     return
+    if args.enable_converter is True:
+        print("[CONVERTER] Convert model ...")
+        converter = args.converter
+        raw_model_dir = f"{saved_dir}/{model_name}"
+        print(f"[CONVERTER] raw_model_dir: {raw_model_dir}")
+        gguf_model_dir = Path(raw_model_dir).parent / f"{model_name}-gguf"
+        if not gguf_model_dir.exists():
+            gguf_model_dir.mkdir()
+        f16_gguf_model_path = gguf_model_dir / f"{model_name}-f16.gguf"
+        print(f"[CONVERTER] f16_gguf_model_path: {f16_gguf_model_path}")
+        # try:
+        #     convert_hf_to_gguf(
+        #         converter,
+        #         raw_model_dir,
+        #         str(f16_gguf_model_path),
+        #         args.pad_vocab,
+        #     )
+        #     print(f"The converted gguf model is saved in {f16_gguf_model_path}.")
+        # except Exception as e:
+        #     print(f"Failed to convert model. {e}")
+        #     return
+    if args.enable_quantizer is True:
+        print("[QUANTIZER] Quantize model ...")
+        quantizer = args.quantizer
+        print(f"[QUANTIZER] quantizer: {quantizer}")
+        if args.quant_type is not None:
+            quant_type = args.quant_type
+            quantized_gguf_model_path = (
+                gguf_model_dir / f"{model_name}-{quant_type}.gguf"
+            )
+            print(f"[QUANTIZER] quant_type: {quant_type}")
+            print(f"[QUANTIZER] quantized_model_filename: {quantized_gguf_model_path}")
+            try:
+                quantize_model(
+                    quantizer,
+                    str(f16_gguf_model_path),
+                    str(quantized_gguf_model_path),
+                    quant_type,
+                )
+                print(
+                    f"The quantized gguf model is saved in {quantized_gguf_model_path}."
+                )
+            except Exception as e:
+                print(e)
+                print("Failed to quantize model.")
+                return
+        else:
+            for quant_type in [
+                # "Q2_K",
+                # "Q3_K_L",
+                # "Q3_K_M",
+                # "Q3_K_S",
+                # "Q4_0",
+                # "Q4_K_M",
+                # "Q4_K_S",
+                # "Q5_0",
+                "Q5_K_M",
+                # "Q5_K_S",
+                "Q6_K",
+                "Q8_0",
+            ]:
+                quantized_gguf_model_path = (
+                    gguf_model_dir / f"{model_name}-{quant_type}.gguf"
+                )
+                print(f"[QUANTIZER] quant_type: {quant_type}")
+                print(
+                    f"[QUANTIZER] quantized_model_filename: {quantized_gguf_model_path}"
+                )
+                try:
+                    quantize_model(
+                        quantizer,
+                        str(f16_gguf_model_path),
+                        str(quantized_gguf_model_path),
+                        quant_type,
+                    )
+                    print(
+                        f"The quantized gguf model is saved in {quantized_gguf_model_path}."
+                    )
+                except Exception as e:
+                    print(e)
+                    print("Failed to quantize model.")
+                    return
+        # # remove the raw model dir for saving space
+        # print(f"The quantization is done. Remove {raw_model_dir}")
+        # shutil.rmtree(raw_model_dir)
+    print("Done.")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,26 @@

+certifi==2023.7.22
+charset-normalizer==3.3.2
+filelock==3.13.1
+fsspec==2023.10.0
+idna==3.4
+Jinja2==3.1.2
+MarkupSafe==2.1.3
+mpmath==1.3.0
+networkx==3.2.1
+numpy==1.26.2
+packaging==23.2
+protobuf==4.25.0
+PyYAML==6.0.1
+regex==2023.10.3
+requests==2.31.0
+safetensors==0.4.0
+sympy==1.12
+tokenizers==0.14.1
+torch==2.1.0
+tqdm==4.66.1
+transformers==4.35.0
+typing_extensions==4.8.0
+urllib3==2.1.0
+sentencepiece==0.1.99
+GitPython==3.1.40
+tiktoken==0.5.2