File size: 2,859 Bytes
49101df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
865eb37
49101df
 
 
 
 
 
 
865eb37
 
 
 
 
 
 
 
 
 
 
 
cd6fcd6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
language:
- mt
datasets:
- MLRS/korpus_malti
model-index:
- name: mBERTu
  results:
  - task: 
      type: dependency-parsing
      name: Dependency Parsing
    dataset:
      type: universal_dependencies
      args: mt_mudt
      name: Maltese Universal Dependencies Treebank (MUDT)
    metrics:
      - type: uas
        value: 92.10
        name: Unlabelled Attachment Score
      - type: las
        value: 87.87
        name: Labelled Attachment Score
  - task: 
      type: part-of-speech-tagging
      name: Part-of-Speech Tagging
    dataset:
      type: mlrs_pos
      name: MLRS POS dataset
    metrics:
      - type: accuracy
        value: 98.66
        name: UPOS Accuracy
        args: upos
      - type: accuracy
        value: 98.58
        name: XPOS Accuracy
        args: xpos
  - task: 
      type: named-entity-recognition
      name: Named Entity Recognition
    dataset:
      type: wikiann
      name: WikiAnn (Maltese)
      args: mt
    metrics:
      - type: f1
        args: span
        value: 86.60
        name: Span-based F1
  - task: 
      type: sentiment-analysis
      name: Sentiment Analysis
    dataset:
      type: mt-sentiment-analysis
      name: Maltese Sentiment Analysis Dataset
    metrics:
      - type: f1
        args: macro
        value: 76.79
        name: Macro-averaged F1
license: cc-by-nc-sa-4.0
widget:
- text: "Malta huwa pajjiż fl-[MASK]."
---

# mBERTu

A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint.


## License

This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

## Citation

This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://arxiv.org/abs/2205.10517).
Cite it as follows: 

```bibtex
@inproceedings{BERTu,
    title       = {Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese},
    author      = {Micallef, Kurt and
                   Gatt, Albert and
                   Tanti, Marc and
                   van der Plas, Lonneke and
                   Borg, Claudia},
    booktitle   = {Proceedings of the 3rd Workshop on Deep Learning for Low-Resource NLP (DeepLo 2022)},
    day         = {14},
    month       = {07},
    year        = {2022},
    address     = {Seattle, Washington},
    publisher   = {Association for Computational Linguistics},
}
```