Sakuna
/

LLaMaCoderAll

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sakuna commited on Jul 29, 2023

Commit

ff30036

•

1 Parent(s): a8e5089

Create README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+datasets:
+- HuggingFaceH4/CodeAlpaca_20K
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- code
+- LLaMa2
+---
+# LLaMaCoder
+## Model Description
+`LLaMaCoder` is based on LLaMa2 7B language model, finetuned using LoRA adaptors.
+## Usage
+Generate code with LLaMaCoder in 4bit model according to the following python snippet:
+```python
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
+import torch
+MODEL_NAME = "Sakuna/LLaMaCoderAll"
+device = "cuda:0"
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_NAME,
+    quantization_config=bnb_config,
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
+tokenizer.pad_token = tokenizer.eos_token
+model = model.to(device)
+model.eval()
+prompt = "Write a Java program to calculate the factorial of a given number k"
+input = f"{prompt}\n### Solution:\n"
+device = "cuda:0"
+inputs = tokenizer(input, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_length=256, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```