khongtrunght commited on
Commit
2a4552e
1 Parent(s): 72df182

Model save

Browse files
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - trl
4
+ - dpo
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: Qwen2-7B-Instruct-SPPO-Function-call-v2.11
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # Qwen2-7B-Instruct-SPPO-Function-call-v2.11
15
+
16
+ This model was trained from scratch on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.1457
19
+ - Rewards/chosen: -1.7639
20
+ - Rewards/rejected: -14.1509
21
+ - Rewards/accuracies: 0.9364
22
+ - Rewards/margins: 12.3871
23
+ - Logps/rejected: -551.2230
24
+ - Logps/chosen: -189.1563
25
+ - Logits/rejected: -1.6081
26
+ - Logits/chosen: -1.5770
27
+
28
+ ## Model description
29
+
30
+ More information needed
31
+
32
+ ## Intended uses & limitations
33
+
34
+ More information needed
35
+
36
+ ## Training and evaluation data
37
+
38
+ More information needed
39
+
40
+ ## Training procedure
41
+
42
+ ### Training hyperparameters
43
+
44
+ The following hyperparameters were used during training:
45
+ - learning_rate: 5e-07
46
+ - train_batch_size: 2
47
+ - eval_batch_size: 2
48
+ - seed: 42
49
+ - distributed_type: multi-GPU
50
+ - num_devices: 8
51
+ - total_train_batch_size: 16
52
+ - total_eval_batch_size: 16
53
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
+ - lr_scheduler_type: cosine
55
+ - lr_scheduler_warmup_ratio: 0.1
56
+ - num_epochs: 2
57
+
58
+ ### Training results
59
+
60
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.2001 | 0.1145 | 250 | 0.2192 | 0.7210 | -1.8684 | 0.9162 | 2.5895 | -305.5732 | -139.4582 | -1.6566 | -1.7096 |
63
+ | 0.1246 | 0.2290 | 500 | 0.1662 | 0.6780 | -4.7708 | 0.9277 | 5.4487 | -363.6193 | -140.3193 | -1.6309 | -1.6619 |
64
+ | 0.0831 | 0.3436 | 750 | 0.1441 | 0.5794 | -6.0728 | 0.9191 | 6.6521 | -389.6595 | -142.2913 | -1.6015 | -1.6194 |
65
+ | 0.0698 | 0.4581 | 1000 | 0.1458 | -0.1931 | -8.1002 | 0.9335 | 7.9071 | -430.2079 | -157.7405 | -1.6062 | -1.6142 |
66
+ | 0.0872 | 0.5726 | 1250 | 0.1416 | -0.0252 | -8.5014 | 0.9393 | 8.4762 | -438.2315 | -154.3822 | -1.5572 | -1.5535 |
67
+ | 0.0547 | 0.6871 | 1500 | 0.1330 | -0.4963 | -9.4547 | 0.9335 | 8.9584 | -457.2992 | -163.8050 | -1.5598 | -1.5574 |
68
+ | 0.1092 | 0.8016 | 1750 | 0.1337 | -1.2236 | -10.3660 | 0.9277 | 9.1424 | -475.5235 | -178.3509 | -1.5822 | -1.5827 |
69
+ | 0.1109 | 0.9162 | 2000 | 0.1190 | -0.4262 | -9.6091 | 0.9364 | 9.1829 | -460.3859 | -162.4036 | -1.5682 | -1.5631 |
70
+ | 0.013 | 1.0307 | 2250 | 0.1355 | -0.4415 | -10.4543 | 0.9393 | 10.0128 | -477.2908 | -162.7087 | -1.5520 | -1.5425 |
71
+ | 0.0107 | 1.1452 | 2500 | 0.1450 | -1.2114 | -11.9528 | 0.9393 | 10.7414 | -507.2599 | -178.1073 | -1.5666 | -1.5494 |
72
+ | 0.0203 | 1.2597 | 2750 | 0.1424 | -1.2291 | -12.7381 | 0.9364 | 11.5090 | -522.9661 | -178.4617 | -1.5798 | -1.5536 |
73
+ | 0.0128 | 1.3743 | 3000 | 0.1428 | -1.5064 | -13.4244 | 0.9393 | 11.9180 | -536.6923 | -184.0067 | -1.5982 | -1.5679 |
74
+ | 0.0447 | 1.4888 | 3250 | 0.1490 | -1.6333 | -13.8914 | 0.9422 | 12.2581 | -546.0324 | -186.5450 | -1.6084 | -1.5768 |
75
+ | 0.0114 | 1.6033 | 3500 | 0.1508 | -1.8097 | -14.2168 | 0.9393 | 12.4071 | -552.5399 | -190.0730 | -1.6144 | -1.5842 |
76
+ | 0.0201 | 1.7178 | 3750 | 0.1447 | -1.7474 | -14.1355 | 0.9393 | 12.3881 | -550.9136 | -188.8267 | -1.6087 | -1.5784 |
77
+ | 0.0139 | 1.8323 | 4000 | 0.1461 | -1.7396 | -14.1065 | 0.9393 | 12.3669 | -550.3343 | -188.6715 | -1.6088 | -1.5783 |
78
+ | 0.0038 | 1.9469 | 4250 | 0.1457 | -1.7639 | -14.1509 | 0.9364 | 12.3871 | -551.2230 | -189.1563 | -1.6081 | -1.5770 |
79
+
80
+
81
+ ### Framework versions
82
+
83
+ - Transformers 4.44.0
84
+ - Pytorch 2.3.1+cu121
85
+ - Datasets 2.20.0
86
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.09263221575655435,
5
+ "train_runtime": 18962.0322,
6
+ "train_samples": 34924,
7
+ "train_samples_per_second": 3.684,
8
+ "train_steps_per_second": 0.23
9
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.44.0"
14
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.09263221575655435,
5
+ "train_runtime": 18962.0322,
6
+ "train_samples": 34924,
7
+ "train_samples_per_second": 3.684,
8
+ "train_steps_per_second": 0.23
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff