File size: 2,859 Bytes

---
language:
- mt
datasets:
- MLRS/korpus_malti
model-index:
- name: mBERTu
  results:
  - task: 
      type: dependency-parsing
      name: Dependency Parsing
    dataset:
      type: universal_dependencies
      args: mt_mudt
      name: Maltese Universal Dependencies Treebank (MUDT)
    metrics:
      - type: uas
        value: 92.10
        name: Unlabelled Attachment Score
      - type: las
        value: 87.87
        name: Labelled Attachment Score
  - task: 
      type: part-of-speech-tagging
      name: Part-of-Speech Tagging
    dataset:
      type: mlrs_pos
      name: MLRS POS dataset
    metrics:
      - type: accuracy
        value: 98.66
        name: UPOS Accuracy
        args: upos
      - type: accuracy
        value: 98.58
        name: XPOS Accuracy
        args: xpos
  - task: 
      type: named-entity-recognition
      name: Named Entity Recognition
    dataset:
      type: wikiann
      name: WikiAnn (Maltese)
      args: mt
    metrics:
      - type: f1
        args: span
        value: 86.60
        name: Span-based F1
  - task: 
      type: sentiment-analysis
      name: Sentiment Analysis
    dataset:
      type: mt-sentiment-analysis
      name: Maltese Sentiment Analysis Dataset
    metrics:
      - type: f1
        args: macro
        value: 76.79
        name: Macro-averaged F1
license: cc-by-nc-sa-4.0
widget:
- text: "Malta huwa pajjiż fl-[MASK]."
---

# mBERTu

A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint.


## License

This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

## Citation

This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://arxiv.org/abs/2205.10517).
Cite it as follows: 

```bibtex
@inproceedings{BERTu,
    title       = {Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese},
    author      = {Micallef, Kurt and
                   Gatt, Albert and
                   Tanti, Marc and
                   van der Plas, Lonneke and
                   Borg, Claudia},
    booktitle   = {Proceedings of the 3rd Workshop on Deep Learning for Low-Resource NLP (DeepLo 2022)},
    day         = {14},
    month       = {07},
    year        = {2022},
    address     = {Seattle, Washington},
    publisher   = {Association for Computational Linguistics},
}
```