ElnaggarLab
/

ankh2-ext2

Text2Text Generation

protein language model

protein embedding

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

agemagician commited on Aug 29

Commit

4c155ee

•

1 Parent(s): d2fd1a7

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ Pretrained model on protein sequences using a masked language modeling (MLM) obj
 ## Model description
-ANKH2-Large is based on the `ANKH-Large` model and was pretrained on a large corpus of protein sequences in a self-supervised fashion.
 This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
 publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
@@ -82,7 +82,7 @@ The details of the masking procedure for each sequence are as follows:
 ### Pretraining
-The model was trained on a single TPU Pod V4-256 for 45 epochs in total, using sequence length 512 (batch size 1k).
 It was trained using ANKH-Large model as an initial checkpoint, rather than training from scratch.
 It has a total of approximately 2B parameters and was trained using the encoder-decoder architecture.
 The optimizer used is Adafactor with linear warmup with linear decay learning rate schedule for pre-training.

 ## Model description
+Ankh2-ext2 is based on the `ANKH-Large` model and was pretrained on a large corpus of protein sequences in a self-supervised fashion.
 This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
 publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
 ### Pretraining
+The model was trained on a single TPU Pod V5-lite for 45 epochs in total, using sequence length 512 (batch size 1k).
 It was trained using ANKH-Large model as an initial checkpoint, rather than training from scratch.
 It has a total of approximately 2B parameters and was trained using the encoder-decoder architecture.
 The optimizer used is Adafactor with linear warmup with linear decay learning rate schedule for pre-training.