--- language: - mt datasets: - MLRS/korpus_malti model-index: - name: mBERTu results: - task: type: dependency-parsing name: Dependency Parsing dataset: type: universal_dependencies args: mt_mudt name: Maltese Universal Dependencies Treebank (MUDT) metrics: - type: uas value: 92.10 name: Unlabelled Attachment Score - type: las value: 87.87 name: Labelled Attachment Score - task: type: part-of-speech-tagging name: Part-of-Speech Tagging dataset: type: mlrs_pos name: MLRS POS dataset metrics: - type: accuracy value: 98.66 name: UPOS Accuracy args: upos - type: accuracy value: 98.58 name: XPOS Accuracy args: xpos - task: type: named-entity-recognition name: Named Entity Recognition dataset: type: wikiann name: WikiAnn (Maltese) args: mt metrics: - type: f1 args: span value: 86.60 name: Span-based F1 - task: type: sentiment-analysis name: Sentiment Analysis dataset: type: mt-sentiment-analysis name: Maltese Sentiment Analysis Dataset metrics: - type: f1 args: macro value: 76.79 name: Macro-averaged F1 license: cc-by-nc-sa-4.0 widget: - text: "Malta huwa pajjiż fl-[MASK]." --- # mBERTu A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint. ## License This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png ## Citation This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://arxiv.org/abs/2205.10517). Cite it as follows: ```bibtex @inproceedings{BERTu, title = {Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese}, author = {Micallef, Kurt and Gatt, Albert and Tanti, Marc and van der Plas, Lonneke and Borg, Claudia}, booktitle = {Proceedings of the 3rd Workshop on Deep Learning for Low-Resource NLP (DeepLo 2022)}, day = {14}, month = {07}, year = {2022}, address = {Seattle, Washington}, publisher = {Association for Computational Linguistics}, } ```