kz-transformers commited on
Commit
ec45403
1 Parent(s): 6150653

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -23,6 +23,18 @@ The Kaz-RoBERTa model was pretrained on the reunion of 2 datasets:
23
  - [Conversational data] Preprocessed dialogs between Customer Support Team and clients of Beeline KZ (Veon Group)(https://beeline.kz/)
24
 
25
  Together these datasets weigh 25GB of text.
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Usage
27
 
28
  You can use this model directly with a pipeline for masked language modeling:
 
23
  - [Conversational data] Preprocessed dialogs between Customer Support Team and clients of Beeline KZ (Veon Group)(https://beeline.kz/)
24
 
25
  Together these datasets weigh 25GB of text.
26
+ ## Training procedure
27
+
28
+ ### Preprocessing
29
+
30
+ The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 52,000. The inputs of
31
+ the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
32
+ with `<s>` and the end of one by `</s>`
33
+
34
+ ### Pretraining
35
+
36
+ The model was trained on 2 V100 GPUs for 500K steps with a batch size of 128 and a sequence length of 512.
37
+
38
  ## Usage
39
 
40
  You can use this model directly with a pipeline for masked language modeling: