hkiyomaru commited on
Commit
8250994
1 Parent(s): e51b61c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -113,11 +113,11 @@ The models have been pre-trained using a blend of the following datasets.
113
 
114
  | Language | Dataset | Tokens|
115
  |:---:|:---:|:---:|
116
- |Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.5B
117
- ||[mC4](https://huggingface.co/datasets/mc4)|136B
118
- |English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|5B
119
- ||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|135B
120
- |Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|10B
121
 
122
  ### Instruction tuning (To be updated)
123
 
 
113
 
114
  | Language | Dataset | Tokens|
115
  |:---:|:---:|:---:|
116
+ |Japanese|[Wikipedia](https://huggingface.co/datasets/wikipedia)|1.4B
117
+ ||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus)|130.7B
118
+ |English|[Wikipedia](https://huggingface.co/datasets/wikipedia)|4.7B
119
+ ||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|110.3B
120
+ |Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|8.7B
121
 
122
  ### Instruction tuning (To be updated)
123