meliksahturker commited on
Commit
5e9a480
1 Parent(s): 29ae154

Upload TFMBartForConditionalGeneration

Browse files
Files changed (4) hide show
  1. README.md +45 -74
  2. config.json +33 -0
  3. generation_config.json +9 -0
  4. tf_model.h5 +3 -0
README.md CHANGED
@@ -1,76 +1,47 @@
1
  ---
2
- language:
3
- - tr
4
- arxiv: 2403.01308
5
- library_name: transformers
6
- pipeline_tag: text2text-generation
7
  ---
8
- # VBART Model Card
9
-
10
- ## Model Description
11
-
12
- VBART is the first sequence-to-sequence model trained in Turkish corpora from scratch. It is trained by VNGRS and its training ended in February 2023.
13
- Model is capable of text transformation task such as summarization, paraphrasing, title generation with fine-tuning.
14
-
15
- This model is scores better on many tasks, albeit being much smaller than other implementations.
16
-
17
- This repository contains fine-tuned weights of VBART for paraphrasing task.
18
-
19
- - **Developed by:** [VNGRS](https://vngrs.com/)
20
- - **Model type:** Transformer encoder-decoder based on mBart
21
- - **Language(s) (NLP):** Turkish
22
- - **License:** [More Information Needed]
23
- - **Finetuned from model:** VBART-Large
24
- - **Paper:** [arxiv](https://arxiv.org/abs/2403.01308)
25
- ## How to Get Started with the Model
26
- Use the code below to get started with the model.
27
- -> Model yüklendikten sonra bir kod çıkar
28
- [More Information Needed]
29
-
30
- ## Training Details
31
-
32
- ### Training Data
33
- Base model training data is filtered mixed corpus made of Turkish parts of [OSCAR-2201](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201) and [mC4](https://huggingface.co/datasets/mc4) datasets. These datasets consist of documents of unstructured web crawl data. More information about the dataset can be found in their respective page. Data then filtered using set of heuristics and certain rules, explained in appendix of our [paper](https://arxiv.org/abs/2403.01308).
34
-
35
- Fine-tuning dataset is TODO
36
-
37
- ### Limitations
38
- This model in fine-tuned to paraphrasing task. It is not intended to be used in any other case and can not be fine-tuned to any other task with full performance of the base model.
39
-
40
- ### Training Procedure
41
- Pretrained for 30 days and with total of 708B tokens.
42
- #### Hardware
43
- - **GPUs**: 8X Nvidia A100-80 GB
44
- #### Software
45
- - Tensorflow
46
- #### Hyperparameters
47
- ##### Pretraining
48
- - **Training regime:** fp16 mixed precision
49
- - **Training objective** : Sentence permutation and span masking (using mask lengths sampled from Poisson distribution λ=3.5 and total of %30 data)
50
- - **Optimizer** : Adam optimizer (β1 = 0.9, β2 = 0.98, Ɛ = 1e-6)
51
- - **Scheduler**: Linear decay scheduler (20,000 warm up steps)
52
- - **Dropout**: 0.1 (dropped to 0.05 and 0 in last 160k steps)
53
- - **Learning rate**: 5e-6
54
- - **Training Amount**: 708B
55
-
56
- ##### Fine-tuning
57
- - **Training regime:** fp16 mixed precision
58
- - **Optimizer** : Adam optimizer (β1 = 0.9, β2 = 0.98, Ɛ = 1e-6)
59
- - **Scheduler**: Linear decay scheduler
60
- - **Dropout**: 0.1
61
- - **Learning rate**: 5e-5
62
- #### Metrics
63
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f8b3c84588fe31f435a92b/pG6H4dI0FXM1zI4Cfj53A.png)
64
-
65
- ## License
66
-
67
-
68
- ## Citation
69
- ```
70
- @article{turker2024vbart,
71
- title={VBART: The Turkish LLM},
72
- author={Turker, Meliksah and Ari, Erdi and Han, Aydin},
73
- journal={arXiv preprint arXiv:2403.01308},
74
- year={2024}
75
- }
76
- ```
 
1
  ---
2
+ tags:
3
+ - generated_from_keras_callback
4
+ model-index:
5
+ - name: VBART-Large-Paraphrasing
6
+ results: []
7
  ---
8
+
9
+ <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
+ probably proofread and complete it, then remove this comment. -->
11
+
12
+ # VBART-Large-Paraphrasing
13
+
14
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
+ It achieves the following results on the evaluation set:
16
+
17
+
18
+ ## Model description
19
+
20
+ More information needed
21
+
22
+ ## Intended uses & limitations
23
+
24
+ More information needed
25
+
26
+ ## Training and evaluation data
27
+
28
+ More information needed
29
+
30
+ ## Training procedure
31
+
32
+ ### Training hyperparameters
33
+
34
+ The following hyperparameters were used during training:
35
+ - optimizer: None
36
+ - training_precision: float32
37
+
38
+ ### Training results
39
+
40
+
41
+
42
+ ### Framework versions
43
+
44
+ - Transformers 4.38.2
45
+ - TensorFlow 2.13.1
46
+ - Datasets 2.18.0
47
+ - Tokenizers 0.15.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "activation_function": "gelu",
4
+ "architectures": [
5
+ "MBartForConditionalGeneration"
6
+ ],
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 2,
9
+ "classifier_dropout": 0.0,
10
+ "d_model": 1024,
11
+ "decoder_attention_heads": 16,
12
+ "decoder_ffn_dim": 4096,
13
+ "decoder_layerdrop": 0.0,
14
+ "decoder_layers": 12,
15
+ "decoder_start_token_id": 2,
16
+ "dropout": 0.1,
17
+ "encoder_attention_heads": 16,
18
+ "encoder_ffn_dim": 4096,
19
+ "encoder_layerdrop": 0.0,
20
+ "encoder_layers": 12,
21
+ "eos_token_id": 3,
22
+ "forced_eos_token_id": 3,
23
+ "init_std": 0.02,
24
+ "is_encoder_decoder": true,
25
+ "max_position_embeddings": 1024,
26
+ "model_type": "mbart",
27
+ "num_hidden_layers": 12,
28
+ "pad_token_id": 0,
29
+ "scale_embedding": false,
30
+ "transformers_version": "4.38.2",
31
+ "use_cache": true,
32
+ "vocab_size": 32000
33
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "decoder_start_token_id": 2,
5
+ "eos_token_id": 3,
6
+ "forced_eos_token_id": 3,
7
+ "pad_token_id": 0,
8
+ "transformers_version": "4.38.2"
9
+ }
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adb6f115a8e8dc70221d3df77a613d58d899084f3caa0a23b46f7692a6ef48fb
3
+ size 1551059288