ClassCat commited on
Commit
17bcdda
1 Parent(s): 037d0db

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - wikipedia
6
+ - cc100
7
+ widget:
8
+ - text: "Yo soy"
9
+ ---
10
+
11
+ ## GPT2 Spanish base model (Uncased)
12
+
13
+ ### Model architecture
14
+
15
+ This model uses GPT2 base setttings except vocabulary size.
16
+
17
+ ### Tokenizer
18
+
19
+ Using BPE tokenizer with vocabulary size 50,000.
20
+
21
+ ### Training Data
22
+
23
+ * [wiki40b/es](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bes) (Spanish Wikipedia)
24
+ * Subset of [CC-100/es](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
25
+
26
+ ### Usage
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+
31
+ generator = pipeline('text-generation', model='ClassCat/gpt2-base-spanish')
32
+ generator("Yo soy ", max_length=50, num_return_sequences=5)
33
+ ```