license: apache-2.0
RakutenAI-7B
Model Description
RakutenAI-7B is a systematic initiative that brings the latest technologies to the world of Japanese LLMs. RakutenAI-7B achieves the best scores on the Japanese language understanding benchmarks while maintaining a competitive performance on the English test sets among similar models such as OpenCalm, Elyza, Youri, Nekomata and Swallow. RakutenAI-7B leverages the Mistral model architecture and is based on Mistral-7B-v0.1 pre-trained checkpoint, exemplifying a successful retrofitting of the pre-trained model weights. Moreover, we extend Mistral's vocabulary from 32k to 48k to offer a better character-per-token rate for Japanese.
If you are looking for an instruction-tuned model, check RakutenAI-7B-instruct.
If you are looking for a chat-tuned model, check RakutenAI-7B-chat.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "Rakuten/RakutenAI-7B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto")
model.eval()
requests = [
"南硫黄島原生自然環境保全地域は、自然",
"The capybara is a giant cavy rodent",
]
for req in requests:
input_ids = tokenizer.encode(req, return_tensors="pt").to(device=model.device)
tokens = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print("INPUT:\n" + req)
print("OUTPUT:\n" + out)
print()
print()
Model Details
- Developed by: Rakuten Group, Inc.
- Language(s): Japanese, English
- License: This model is licensed under Apache License, Version 2.0.
Limitations and Bias
The suite of RakutenAI-7B models is capable of generating human-like text on a wide range of topics. However, like all LLMs, they have limitations and can produce biased, inaccurate, or unsafe outputs. Please exercise caution and judgement while interacting with them.
Citation
For citing our work on the suite of RakutenAI-7B models, please use:
@misc{2024RakutenAI-7B,
title={RakutenAI-7B: Extending Large Language Models for Japanese},
author={Rakuten Group, Inc.},
year={2024},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CL}
}