hkiyomaru commited on
Commit
4ef7c89
1 Parent(s): 8703a3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -93,7 +93,7 @@ print(tokenizer.decode(output))
93
  - **Hardware:** 8 A100 40GB GPUs ([mdx cluster](https://mdx.jp/en/))
94
  - **Software:** [TRL](https://github.com/huggingface/trl), [PEFT](https://github.com/huggingface/peft), and [DeepSpeed](https://github.com/microsoft/DeepSpeed)
95
 
96
- ## Tokenizer (To be updated)
97
 
98
  The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.
99
  The vocabulary entries were converted from [`llm-jp-tokenizer v2.2 (50k)`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v2.2).
@@ -105,7 +105,7 @@ Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-
105
  - **Vocabulary size:** 48,588 (mixed vocabulary of Japanese, English, and source code)
106
 
107
 
108
- ## Datasets (To be updated)
109
 
110
  ### Pre-training
111
 
 
93
  - **Hardware:** 8 A100 40GB GPUs ([mdx cluster](https://mdx.jp/en/))
94
  - **Software:** [TRL](https://github.com/huggingface/trl), [PEFT](https://github.com/huggingface/peft), and [DeepSpeed](https://github.com/microsoft/DeepSpeed)
95
 
96
+ ## Tokenizer
97
 
98
  The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.
99
  The vocabulary entries were converted from [`llm-jp-tokenizer v2.2 (50k)`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v2.2).
 
105
  - **Vocabulary size:** 48,588 (mixed vocabulary of Japanese, English, and source code)
106
 
107
 
108
+ ## Datasets
109
 
110
  ### Pre-training
111