kz-transformers
/

kaz-roberta-conversational

Inference Endpoints

Model card Files Files and versions Community

kz-transformers commited on Apr 19

Commit

ec45403

•

1 Parent(s): 6150653

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -23,6 +23,18 @@ The Kaz-RoBERTa model was pretrained on the reunion of 2 datasets:
 - [Conversational data] Preprocessed dialogs between Customer Support Team and clients of Beeline KZ (Veon Group)(https://beeline.kz/)
 Together these datasets weigh 25GB of text.
 ## Usage
 You can use this model directly with a pipeline for masked language modeling:

 - [Conversational data] Preprocessed dialogs between Customer Support Team and clients of Beeline KZ (Veon Group)(https://beeline.kz/)
 Together these datasets weigh 25GB of text.
+## Training procedure
+### Preprocessing
+The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 52,000. The inputs of
+the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
+with `<s>` and the end of one by `</s>`
+### Pretraining
+The model was trained on 2 V100 GPUs for 500K steps with a batch size of 128 and a sequence length of 512.
 ## Usage
 You can use this model directly with a pipeline for masked language modeling: