metadata

language:
  - mt
datasets:
  - MLRS/korpus_malti
model-index:
  - name: mBERTu
    results:
      - task:
          type: dependency-parsing
          name: Dependency Parsing
        dataset:
          type: universal_dependencies
          args: mt_mudt
          name: Maltese Universal Dependencies Treebank (MUDT)
        metrics:
          - type: uas
            value: 92.1
            name: Unlabelled Attachment Score
          - type: las
            value: 87.87
            name: Labelled Attachment Score
      - task:
          type: part-of-speech-tagging
          name: Part-of-Speech Tagging
        dataset:
          type: mlrs_pos
          name: MLRS POS dataset
        metrics:
          - type: accuracy
            value: 98.66
            name: UPOS Accuracy
            args: upos
          - type: accuracy
            value: 98.58
            name: XPOS Accuracy
            args: xpos
      - task:
          type: named-entity-recognition
          name: Named Entity Recognition
        dataset:
          type: wikiann
          name: WikiAnn (Maltese)
          args: mt
        metrics:
          - type: f1
            args: span
            value: 86.6
            name: Span-based F1
      - task:
          type: sentiment-analysis
          name: Sentiment Analysis
        dataset:
          type: mt-sentiment-analysis
          name: Maltese Sentiment Analysis Dataset
        metrics:
          - type: f1
            args: macro
            value: 76.79
            name: Macro-averaged F1
license: cc-by-nc-sa-4.0
widget:
  - text: Malta huwa pajjiż fl-[MASK].

mBERTu

A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese. Cite it as follows:

@inproceedings{BERTu,
    title       = {Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese},
    author      = {Micallef, Kurt and
                   Gatt, Albert and
                   Tanti, Marc and
                   van der Plas, Lonneke and
                   Borg, Claudia},
    booktitle   = {Proceedings of the 3rd Workshop on Deep Learning for Low-Resource NLP (DeepLo 2022)},
    day         = {14},
    month       = {07},
    year        = {2022},
    address     = {Seattle, Washington},
    publisher   = {Association for Computational Linguistics},
}