projecte-aina
/

aina-translator-gl-ca

Fairseq

Galician

Catalan

Model card Files Files and versions Community

AudreyVM commited on Dec 11, 2023

Commit

51e4e23

•

1 Parent(s): 9dc3467

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -11

README.md CHANGED Viewed

@@ -48,7 +48,7 @@ import pyonmttok
 from huggingface_hub import snapshot_download
 model_dir = snapshot_download(repo_id="projecte-aina/mt-aina-gl-ca", revision="main")
 tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/spm.model")
-tokenized=tokenizer.tokenize("Benvido ao proxecto Aina.")
 translator = ctranslate2.Translator(model_dir)
 translated = translator.translate_batch([tokenized[0]])
 print(tokenizer.detokenize(translated[0][0]['tokens']))
@@ -122,24 +122,24 @@ Weights were saved every 1000 updates and reported results are the average of th
 ### Variable and metrics
 We use the BLEU score for evaluation on test sets: [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200), [TaCon](https://elrc-share.eu/repository/browse/tacon-spanish-constitution-mt-test-set/84a96138b98611ec9c1a00155d02670628f3e6857b0f422abd82abc3795ec8c2/) and [NTREX](https://github.com/MicrosoftTranslator/NTREX)
 ### Evaluation results
-Below are the evaluation results on the machine translation from Galician to Catalan compared to [M2M100 1.2B](https://huggingface.co/facebook/m2m100_1.2B), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [ NLLB-200's distilled 1.3B variant](https://huggingface.co/facebook/nllb-200-distilled-1.3B):
-| Test set         	|M2M100 1.2B| NLLB 1.3B | NLLB 3.3 |mt-aina-gl-ca|
-|----------------------|-------|-----------|------------------|---------------|
-| Flores 200 devtest   	|32,6| 22,3   	| **34,3**   	| 32,4     	|
-| TaCON                 |56,5|32,2  	    | 54,1      	| **58,2**     	|
-| NTREX                 |34,0|20,4    	| **34,2**     	| 33,7     	|
-| Average           	|41,0| 25,0 	| 40,9     	    | **41,4**      	|
 ## Additional information
 ### Author
-Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center (langtech@bsc.es)
 ### Contact information
-For further information, send an email to <aina@bsc.es>
 ### Copyright
 Copyright Language Technologies Unit at Barcelona Supercomputing Center (2023)
 ### Licensing information
 This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Funding
-This work was funded by the Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya within the framework of Projecte AINA.
 ### Disclaimer
 <details>
 <summary>Click to expand</summary>

 from huggingface_hub import snapshot_download
 model_dir = snapshot_download(repo_id="projecte-aina/mt-aina-gl-ca", revision="main")
 tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/spm.model")
+tokenized=tokenizer.tokenize("Benvido ao proxecto Ilenia.")
 translator = ctranslate2.Translator(model_dir)
 translated = translator.translate_batch([tokenized[0]])
 print(tokenizer.detokenize(translated[0][0]['tokens']))
 ### Variable and metrics
 We use the BLEU score for evaluation on test sets: [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200), [TaCon](https://elrc-share.eu/repository/browse/tacon-spanish-constitution-mt-test-set/84a96138b98611ec9c1a00155d02670628f3e6857b0f422abd82abc3795ec8c2/) and [NTREX](https://github.com/MicrosoftTranslator/NTREX)
 ### Evaluation results
+Below are the evaluation results on the machine translation from Galician to Catalan compared to [Google Translate](https://translate.google.com/), [M2M100 1.2B](https://huggingface.co/facebook/m2m100_1.2B), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [ NLLB-200's distilled 1.3B variant](https://huggingface.co/facebook/nllb-200-distilled-1.3B):
+| Test set         	|Google Translate|M2M100 1.2B| NLLB 1.3B | NLLB 3.3 |mt-aina-gl-ca|
+|----------------------|----|-------|-----------|------------------|---------------|
+|Flores 101 devtest   	|**36,4**|32,6| 22,3   	| 34,3   	| 32,4     	|
+| TaCON                 |48,4|56,5|32,2  	    | 54,1      	| **58,2**     	|
+| NTREX                 |**34,7**|34,0|20,4    	| 34,2     	| 33,7     	|
+| Average           	|39,0|41,0| 25,0 	| 40,9     	    | **41,4**      	|
 ## Additional information
 ### Author
+Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center.
 ### Contact information
+For further information, send an email to <langtech@bsc.es>
 ### Copyright
 Copyright Language Technologies Unit at Barcelona Supercomputing Center (2023)
 ### Licensing information
 This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Funding
+This work was funded by the SEDIA within the framework of ILENIA
 ### Disclaimer
 <details>
 <summary>Click to expand</summary>