onlplab
/

alephbert-base

Inference Endpoints

Model card Files Files and versions Community

aseker00 commited on Mar 13, 2021

Commit

99fa890

•

1 Parent(s): 8b36f5b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -45,9 +45,9 @@ To optimize training time we split the data into 4 sections based on max number
 3. 64 <= num tokens < 128 (10M sentences)
 4. 128 <= num tokens < 512 (70M sentences)
-Each section was trained for 5 epochs with an initial learning rate set to 1e-4.
-Total training time was 5 days.
 ## Eval

 3. 64 <= num tokens < 128 (10M sentences)
 4. 128 <= num tokens < 512 (70M sentences)
+Each section was first trained for 5 epochs with an initial learning rate set to 1e-4. Then each second was trained for another 5 epochs with an initial learning rate set to 1e-5, for a total of 10 epochs.
+Total training time was 8 days.
 ## Eval