pascalrai commited on
Commit
b894e3b
1 Parent(s): 1b6ed0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -92,7 +92,7 @@ The model was pre-trained continuously on a single A10G GPU in an AWS instance f
92
  #### Possible Future Directions:
93
 
94
  1. Use a decoder-only model for pre-training and summarization.
95
- <br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder to decoder generation.
96
  <br>Thus, hurts the performance of the Abstractive Summarization task.
97
  <br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
98
 
 
92
  #### Possible Future Directions:
93
 
94
  1. Use a decoder-only model for pre-training and summarization.
95
+ <br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder context during Cross-attention to decoder generation.
96
  <br>Thus, hurts the performance of the Abstractive Summarization task.
97
  <br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
98