onurgu commited on
Commit
8075d90
1 Parent(s): d809194

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -4
README.md CHANGED
@@ -4,6 +4,9 @@ language:
4
  - tr
5
  library_name: transformers
6
  pipeline_tag: text2text-generation
 
 
 
7
  ---
8
 
9
 
@@ -19,6 +22,15 @@ The model is shared with the public to be used solely for non-commercial academi
19
 
20
  ## Model Details
21
 
 
 
 
 
 
 
 
 
 
22
  ### Model Description
23
 
24
  <!-- Provide a longer summary of what this model is. -->
@@ -30,7 +42,7 @@ The model is shared with the public to be used solely for non-commercial academi
30
  - **Language(s) (NLP):** Turkish
31
  - **License:** The model is shared with the public to be used solely for non-commercial academic research purposes.
32
 
33
- ### Model Sources [optional]
34
 
35
  <!-- Provide the basic links for the model. -->
36
 
@@ -51,9 +63,9 @@ This model can be used for research purposes. You give some text and this model
51
 
52
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
 
54
- This model can be finetuned using [our library](https://github.com/boun-tabi-LMG/turkish-lm-tuner) to solve your own task involving Turkish language.
55
 
56
- This model can be further trained for behaving more helpful, less harmful and better for dialog use cases.
57
 
58
  ### Out-of-Scope Use
59
 
@@ -82,14 +94,28 @@ We refer to the Flan-T5's [official model card](https://arxiv.org/pdf/2210.11416
82
 
83
  ## How to Get Started with the Model
84
 
85
- You can find the technical usage guidance at our library's Github [page](https://github.com/boun-tabi-LMG/turkish-lm-tuner).
86
 
87
  ## Training Details
88
 
 
 
 
 
 
 
89
  Refer to the paper for more information.
90
 
 
91
  ## Evaluation
92
 
 
 
 
 
 
 
 
93
  Refer to the paper for more information.
94
 
95
  ## Environmental Impact
 
4
  - tr
5
  library_name: transformers
6
  pipeline_tag: text2text-generation
7
+ datasets:
8
+ - batubayk/TR-News
9
+ - mlsum
10
  ---
11
 
12
 
 
22
 
23
  ## Model Details
24
 
25
+ - 36 encoder and decoder layers
26
+ - 16 attention heads
27
+ - Token embeddings are 1024 dimensional
28
+ - The multi-layer perceptron layers have 2816 hidden dimensions and employ Gated GeLu activations
29
+ - The parameters of the input and classification layers are not shared
30
+ - 1.1B parameters
31
+ - used a unigram subword tokenizer trained on 10GB of text that consists of random subsets of OSCAR, OPUS, and Wikipedia
32
+ - Vocabulary size: 32000 tokens + 128 special tokens
33
+
34
  ### Model Description
35
 
36
  <!-- Provide a longer summary of what this model is. -->
 
42
  - **Language(s) (NLP):** Turkish
43
  - **License:** The model is shared with the public to be used solely for non-commercial academic research purposes.
44
 
45
+ ### Model Sources
46
 
47
  <!-- Provide the basic links for the model. -->
48
 
 
63
 
64
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
65
 
66
+ This model can be finetuned using [our library](https://github.com/boun-tabi-LMG/turkish-lm-tuner) to solve your custom task involving Turkish language.
67
 
68
+ This model can be further trained to behave more helpful, less harmful and better for dialog use cases.
69
 
70
  ### Out-of-Scope Use
71
 
 
94
 
95
  ## How to Get Started with the Model
96
 
97
+ You can find the technical guidance at our library's Github [page](https://github.com/boun-tabi-LMG/turkish-lm-tuner).
98
 
99
  ## Training Details
100
 
101
+ - The pretraining was performed with Mixture-of-Denoisers (MoD)
102
+ - This version of the model is trained for 1740000 steps
103
+ - Batch size: 48
104
+ - Input and output lengths: 512
105
+ - Effectively exposed to 42.7B tokens
106
+
107
  Refer to the paper for more information.
108
 
109
+
110
  ## Evaluation
111
 
112
+ We didn't yet evaluate the model for biases in any way.
113
+
114
+ We only performed finetuning for several understanding and generation tasks:
115
+
116
+ - Paraphrasing: TAT and OST [source](https://aclanthology.org/2022.icnlsp-1.14.pdf)
117
+ - Summarization: [TRNews](https://dl.acm.org/doi/10.1007/s10579-021-09568-y) and [MLSUM](https://arxiv.org/pdf/2004.14900v1.pdf)
118
+
119
  Refer to the paper for more information.
120
 
121
  ## Environmental Impact