agemagician commited on
Commit
4c155ee
1 Parent(s): d2fd1a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -18,7 +18,7 @@ Pretrained model on protein sequences using a masked language modeling (MLM) obj
18
 
19
  ## Model description
20
 
21
- ANKH2-Large is based on the `ANKH-Large` model and was pretrained on a large corpus of protein sequences in a self-supervised fashion.
22
  This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
23
  publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
24
 
@@ -82,7 +82,7 @@ The details of the masking procedure for each sequence are as follows:
82
 
83
  ### Pretraining
84
 
85
- The model was trained on a single TPU Pod V4-256 for 45 epochs in total, using sequence length 512 (batch size 1k).
86
  It was trained using ANKH-Large model as an initial checkpoint, rather than training from scratch.
87
  It has a total of approximately 2B parameters and was trained using the encoder-decoder architecture.
88
  The optimizer used is Adafactor with linear warmup with linear decay learning rate schedule for pre-training.
 
18
 
19
  ## Model description
20
 
21
+ Ankh2-ext2 is based on the `ANKH-Large` model and was pretrained on a large corpus of protein sequences in a self-supervised fashion.
22
  This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
23
  publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
24
 
 
82
 
83
  ### Pretraining
84
 
85
+ The model was trained on a single TPU Pod V5-lite for 45 epochs in total, using sequence length 512 (batch size 1k).
86
  It was trained using ANKH-Large model as an initial checkpoint, rather than training from scratch.
87
  It has a total of approximately 2B parameters and was trained using the encoder-decoder architecture.
88
  The optimizer used is Adafactor with linear warmup with linear decay learning rate schedule for pre-training.