raptorkwok commited on
Commit
59818f4
1 Parent(s): fda4979

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -6,14 +6,13 @@ metrics:
6
  model-index:
7
  - name: cantonese-chinese-translation-gen1
8
  results: []
 
 
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
- # cantonese-chinese-translation-gen1
15
-
16
- This model is a fine-tuned version of [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 1.5413
19
  - Bleu: 40.7808
@@ -22,18 +21,20 @@ It achieves the following results on the evaluation set:
22
 
23
  ## Model description
24
 
25
- More information needed
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
 
 
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
@@ -62,4 +63,4 @@ The following hyperparameters were used during training:
62
  - Transformers 4.28.1
63
  - Pytorch 2.3.1+cu121
64
  - Datasets 2.19.1
65
- - Tokenizers 0.13.3
 
6
  model-index:
7
  - name: cantonese-chinese-translation-gen1
8
  results: []
9
+ datasets:
10
+ - raptorkwok/cantonese-chinese-dataset-gen2
11
  ---
12
 
13
+ # Cantonese-Written Chinese Translation Model
 
14
 
15
+ This model is a fine-tuned version of [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) on [Cantonese-Written Chinese Dataset Gen2](https://huggingface.co/raptorkwok/cantonese-chinese-dataset-gen2).
 
 
16
  It achieves the following results on the evaluation set:
17
  - Loss: 1.5413
18
  - Bleu: 40.7808
 
21
 
22
  ## Model description
23
 
24
+ The model is based on BART Chinese model, trained on 1M Cantonese-Written Chinese Parallel Corpus data.
25
 
26
  ## Intended uses & limitations
27
 
28
+ Its intended use is to translate Cantonese sentences to Written Chinese accurately.
29
 
30
  ## Training and evaluation data
31
 
32
+ Training and evaluation data is provided by the [Cantonese-Written Chinese Dataset Gen2](https://huggingface.co/raptorkwok/cantonese-chinese-dataset-gen2).
33
 
34
  ## Training procedure
35
 
36
+ The training was performed using `Seq2SeqTrainer`.
37
+
38
  ### Training hyperparameters
39
 
40
  The following hyperparameters were used during training:
 
63
  - Transformers 4.28.1
64
  - Pytorch 2.3.1+cu121
65
  - Datasets 2.19.1
66
+ - Tokenizers 0.13.3