--- inference: false language: - en license: llama2 model_type: llama pipeline_tag: text-generation tags: - facebook - meta - pytorch - llama - llama-2 --- # 🦚Merak-7B-v3-Mini-Orca GPTQ🐳

Merak Orca

These files are GPTQ model files for [**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo) [**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo) is Ichsan2895's [Merak-7B-v3](https://huggingface.co/Ichsan2895/Merak-7B-v3) fine-tuned on Bahasa Indonesia translated psmathur's [orca_mini_v1_dataset](https://huggingface.co/datasets/psmathur/orca_mini_v1_dataset). ### Prompt format You can use [Vicuna 1.1](https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Vicuna-v1.1.yaml) format for Ooobabooga's text generation webui. ``` SYSTEM: Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang. USER: (without the <>) ASSISTANT: ``` ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui). Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui). It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install. 1. Click the **Model tab**. 2. Under **Download custom model or LoRA**, enter `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`. - To download from a specific branch, enter for example `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ` 3. Click **Download**. 4. The model will start downloading. Once it's finished it will say "Done" 5. In the top left, click the refresh icon next to **Model**. 6. In the **Model** dropdown, choose the model you just downloaded: `Merak-7B-v3-Mini-Orca-Indo-GPTQ` 7. In the **Model Loader** dropdown, choose ExLlamav2_HF as the model loader. 8. Click load. 9. Click the **Default** tab 10. Copy prompt format mentioned above to the input box. 11. Enter a prompt and click generate! Click continue to get longer response. ## How to use this GPTQ model from Python code First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed: `GITHUB_ACTIONS=true pip install auto-gptq` pip install sentencepiece Then try the following example code: ```python from transformers import AutoTokenizer, pipeline, logging from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig model_name_or_path = "asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ" model_basename = "Merak-7B-v3-Mini-Orca-Indo-GPTQ" use_triton = False tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, model_basename=model_basename, use_safetensors=True, trust_remote_code=True, device="cuda:0", use_triton=use_triton, quantize_config=None) prompt = "Buat rencana untuk menghemat listrik di rumah" system_message = "Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.\n" prompt_template=f'''SYSTEM: {system_message} USER: {prompt} ASSISTANT: ''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline # Prevent printing spurious transformers error when using pipeline with AutoGPTQ logging.set_verbosity(logging.CRITICAL) print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text']) ``` ## Compatibility The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), and Occ4m's GPTQ-for-LLaMa fork. ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility. ## Credits [TheBloke](https://huggingface.co/TheBloke/) for the Readme template.