antalvdb commited on
Commit
1a918fa
1 Parent(s): 9de8742

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -21,7 +21,8 @@ This is a text-to-text fine-tuned version of
21
  [facebook/bart-base](https://huggingface.co/facebook/bart-base)
22
  trained on spelling correction. It leans on the excellent work by
23
  Oliver Guhr ([github](https://github.com/oliverguhr/spelling),
24
- [huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training was performed on an AWS EC2 instance (g5.xlarge) on a single GPU.
 
25
 
26
  ## Intended uses & limitations
27
 
@@ -31,13 +32,15 @@ checker. A next version of the model will be trained on more data.
31
 
32
  ## Training and evaluation data
33
 
34
- The model was trained on a Dutch dataset composed of 1,500,000 lines of
35
- text from three public Dutch sources, downloaded from the [Opus
36
- corpus](https://opus.nlpl.eu/):
37
 
38
- - nl-europarlv7.100k.txt (1,000,000 lines)
39
- - nl-opensubtitles2016.100k.txt (1,000,000 lines)
40
- - nl-wikipedia.100k.txt (964,203 lines)
 
 
41
 
42
  ## Training procedure
43
 
 
21
  [facebook/bart-base](https://huggingface.co/facebook/bart-base)
22
  trained on spelling correction. It leans on the excellent work by
23
  Oliver Guhr ([github](https://github.com/oliverguhr/spelling),
24
+ [huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training
25
+ was performed on an AWS EC2 instance (g5.xlarge) on a single GPU.
26
 
27
  ## Intended uses & limitations
28
 
 
32
 
33
  ## Training and evaluation data
34
 
35
+ The model was trained on a Dutch dataset composed of 2,964,203 (nearly
36
+ 3m lines) of text from three public Dutch sources, downloaded from the
37
+ [Opus corpus](https://opus.nlpl.eu/):
38
 
39
+ - nl-europarlv7.1m.txt (1,000,000 lines)
40
+ - nl-opensubtitles2016.1m.txt (1,000,000 lines)
41
+ - nl-wikipedia.txt (964,203 lines)
42
+
43
+ Together these texts comprise 45,308,056 tokens.
44
 
45
  ## Training procedure
46