jgrosjean commited on
Commit
137469f
1 Parent(s): 2e83ad6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -155,7 +155,7 @@ The performance is measured via accuracy, i.e. the ratio of correct vs. total ma
155
 
156
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
157
 
158
- Articles with the topic tags "movies/tv series", "corona" and "football" (or related) are filtered from the corpus and split into training data (80%) and test data (20%). Subsequently, embeddings are set up for the train and test data. The test data is then classified using the training data via a k-nearest neighbors approach. The script can be found [here](https://github.com/jgrosjean-mathesis/sentence-swissbert/tree/main/evaluation).
159
 
160
  Note: For French, Italian and Romansh, the training data remains in German, while the test data comprises of translations. This provides insights in the model's abilities in cross-lingual transfer.
161
 
@@ -177,4 +177,4 @@ Sentence SwissBERT achieves comparable or better results as the best-performing
177
 
178
  #### Baseline
179
 
180
- The baseline uses mean pooling embeddings from the last hidden state of the original swissbert model and (in this task) best-performing Sentence-BERT model [distiluse-base-multilingual-cased-v1](https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1)
 
155
 
156
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
157
 
158
+ A number of articles with defined topic tags are mapped to 10 categories, filtered from the corpus and split into training data (80%) and test data (20%). Subsequently, embeddings are set up for the train and test data. The test data is then classified using the training data via a k-nearest neighbors approach. The script can be found [here](https://github.com/jgrosjean-mathesis/sentence-swissbert/tree/main/evaluation).
159
 
160
  Note: For French, Italian and Romansh, the training data remains in German, while the test data comprises of translations. This provides insights in the model's abilities in cross-lingual transfer.
161
 
 
177
 
178
  #### Baseline
179
 
180
+ The baseline uses mean pooling embeddings from the last hidden state of the original swissbert model and (in these tasks) best-performing Sentence-BERT model [distiluse-base-multilingual-cased-v1](https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1).