File size: 460 Bytes
760281a
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
---
language: ja
license: cc-by-sa-3.0
datasets:
- wikipedia
---

# BERT base Japanese (character-level tokenization with whole word masking, jawiki-20200831)

This pretrained model is almost same to [cl-tohoku/bert-base-japanese-char-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-char-v2) but do not need `fugashi` or `unidic_lite`.
The only difference is in `word_tokenzer` property (specify `basic` instead of `mecab`) in `tokenizer_config.json`.