xyla
/

Clinical-T5-Large

Model card Files Files and versions Community

xyla commited on Jan 25, 2023

Commit

395c2f9

•

1 Parent(s): 8c62bd7

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -23,9 +23,9 @@ We make two important preprocessing steps:
 We train the Clinical-T5-Large model from scratch using a cased-vocab of 32,000. We train it for 780,000 steps, using a batch size of 12 per TPU pod (8 pods total), and a sequence length of 512.
 This results in a batch size of 49,152. Accounting for the number of steps, this equates to 38B tokens. We were aiming for 40B, but our Google Cloud instance broke! We use the same LR schedule as the original T5 paper.
-We train the Clinical-T5-Scratch model using an uncased vocab of 32,0000. This model is trained for 28 epochs total, with a sequence length of 512. We use the same LR schedule as the original T5 paper.
-As mentioned previously, we also initialize train two models initialized from T5-base and SciFive. These are trained for ~13B tokens, using a batch size of 32 per GPU (8 GPUs), and a sequence length of 512.
 # How to use the Model
 You will first need to have credentialed PhysioNet access to use model. Why? There is reasonable evidence that these models contain leakage, especially the larger ones. Releasing a model that leaks these notes would be a data-use agreement violation. To get PhysioNet access, you must pass the CITI training.
@@ -41,5 +41,8 @@ tokenizer = AutoTokenizer.from_pretrained(INSERT_PATH_TO_MODEL_FOLDER)
 model = AutoModelForSeq2SeqLM.from_pretrained(PATH_TO_MODEL_FOLDER)
 ```
 # Questions?
 If you have any questions about using the models, please email eric@xyla.com.

 We train the Clinical-T5-Large model from scratch using a cased-vocab of 32,000. We train it for 780,000 steps, using a batch size of 12 per TPU pod (8 pods total), and a sequence length of 512.
 This results in a batch size of 49,152. Accounting for the number of steps, this equates to 38B tokens. We were aiming for 40B, but our Google Cloud instance broke! We use the same LR schedule as the original T5 paper.
+We train the Clinical-T5-Scratch model using an uncased vocab of 32,0000. This model is trained for 28 epochs total, with a sequence length of 512 (~40B tokens total). We use the same LR schedule as the original T5 paper.
+As mentioned previously, we also initialize train two models initialized from T5-base and SciFive. These are trained for ~13B tokens, using a batch size of 32 per GPU (8 GPUs), and a sequence length of 512. In an attempt to speed up training, and help the models quickly adapt, we increase the warm-up steps from 10K to 40K. This helps the model initialized from T5-Base, but not SciFive.
 # How to use the Model
 You will first need to have credentialed PhysioNet access to use model. Why? There is reasonable evidence that these models contain leakage, especially the larger ones. Releasing a model that leaks these notes would be a data-use agreement violation. To get PhysioNet access, you must pass the CITI training.
 model = AutoModelForSeq2SeqLM.from_pretrained(PATH_TO_MODEL_FOLDER)
 ```
+# Tips
+Use the models initialized from scratch! Based on our preliminary results, we find that these are best.
 # Questions?
 If you have any questions about using the models, please email eric@xyla.com.