Upload README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,8 @@ This is a text-to-text fine-tuned version of
|
|
21 |
[facebook/bart-base](https://huggingface.co/facebook/bart-base)
|
22 |
trained on spelling correction. It leans on the excellent work by
|
23 |
Oliver Guhr ([github](https://github.com/oliverguhr/spelling),
|
24 |
-
[huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training
|
|
|
25 |
|
26 |
## Intended uses & limitations
|
27 |
|
@@ -31,13 +32,15 @@ checker. A next version of the model will be trained on more data.
|
|
31 |
|
32 |
## Training and evaluation data
|
33 |
|
34 |
-
The model was trained on a Dutch dataset composed of
|
35 |
-
text from three public Dutch sources, downloaded from the
|
36 |
-
corpus](https://opus.nlpl.eu/):
|
37 |
|
38 |
-
- nl-europarlv7.
|
39 |
-
- nl-opensubtitles2016.
|
40 |
-
- nl-wikipedia.
|
|
|
|
|
41 |
|
42 |
## Training procedure
|
43 |
|
|
|
21 |
[facebook/bart-base](https://huggingface.co/facebook/bart-base)
|
22 |
trained on spelling correction. It leans on the excellent work by
|
23 |
Oliver Guhr ([github](https://github.com/oliverguhr/spelling),
|
24 |
+
[huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training
|
25 |
+
was performed on an AWS EC2 instance (g5.xlarge) on a single GPU.
|
26 |
|
27 |
## Intended uses & limitations
|
28 |
|
|
|
32 |
|
33 |
## Training and evaluation data
|
34 |
|
35 |
+
The model was trained on a Dutch dataset composed of 2,964,203 (nearly
|
36 |
+
3m lines) of text from three public Dutch sources, downloaded from the
|
37 |
+
[Opus corpus](https://opus.nlpl.eu/):
|
38 |
|
39 |
+
- nl-europarlv7.1m.txt (1,000,000 lines)
|
40 |
+
- nl-opensubtitles2016.1m.txt (1,000,000 lines)
|
41 |
+
- nl-wikipedia.txt (964,203 lines)
|
42 |
+
|
43 |
+
Together these texts comprise 45,308,056 tokens.
|
44 |
|
45 |
## Training procedure
|
46 |
|