MichalMlodawski commited on
Commit
efa2904
1 Parent(s): 273ec91

Upload 7 files

Browse files
Files changed (7) hide show
  1. config.json +1 -1
  2. model.safetensors +1 -1
  3. optimizer.pt +3 -0
  4. rng_state.pth +3 -0
  5. scheduler.pt +3 -0
  6. trainer_state.json +3015 -0
  7. training_args.bin +3 -0
config.json CHANGED
@@ -38,6 +38,6 @@
38
  "problem_type": "single_label_classification",
39
  "semantic_loss_ignore_index": 255,
40
  "torch_dtype": "float32",
41
- "transformers_version": "4.32.1",
42
  "width_multiplier": 1.0
43
  }
 
38
  "problem_type": "single_label_classification",
39
  "semantic_loss_ignore_index": 255,
40
  "torch_dtype": "float32",
41
+ "transformers_version": "4.43.4",
42
  "width_multiplier": 1.0
43
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:59e1b315fa4a40262dc96282b0561dd938b4696d5d76496b59f5a2e151682284
3
  size 17659076
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d57700c882ab427e10c9c80a344ffa47bc83aa4a5b0e7b7929a11e1782f9fbda
3
  size 17659076
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10e3e128795e27ba452ab68bea725092b40bc19be5a77276d498ea7c25580dd4
3
+ size 35281594
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c295ee6b6de2d3abd8a1fc93136e75c566269d23f9d0fe83fe1ccc60ad2f6d63
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb72fd59e913499b19f383c863552adcb643a19839e375a1dbe73d02672f54a8
3
+ size 1000
trainer_state.json ADDED
@@ -0,0 +1,3015 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9921853502169998,
3
+ "best_model_checkpoint": "C:\\uczonko_clear\\transformery\\models_mobilevitv2_eyes_VIT\\checkpoint-20720",
4
+ "epoch": 3.4835238735709484,
5
+ "eval_steps": 2960,
6
+ "global_step": 20720,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.008406186953597848,
13
+ "grad_norm": 0.8845431208610535,
14
+ "learning_rate": 8.44451950684006e-09,
15
+ "loss": 0.6941,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.016812373907195696,
20
+ "grad_norm": 0.8216766715049744,
21
+ "learning_rate": 1.688903901368012e-08,
22
+ "loss": 0.6963,
23
+ "step": 100
24
+ },
25
+ {
26
+ "epoch": 0.025218560860793545,
27
+ "grad_norm": 0.9512827396392822,
28
+ "learning_rate": 2.533355852052018e-08,
29
+ "loss": 0.6934,
30
+ "step": 150
31
+ },
32
+ {
33
+ "epoch": 0.03362474781439139,
34
+ "grad_norm": 0.8841463923454285,
35
+ "learning_rate": 3.377807802736024e-08,
36
+ "loss": 0.6933,
37
+ "step": 200
38
+ },
39
+ {
40
+ "epoch": 0.04203093476798924,
41
+ "grad_norm": 0.9045664668083191,
42
+ "learning_rate": 4.22225975342003e-08,
43
+ "loss": 0.6961,
44
+ "step": 250
45
+ },
46
+ {
47
+ "epoch": 0.05043712172158709,
48
+ "grad_norm": 0.7906067967414856,
49
+ "learning_rate": 5.066711704104036e-08,
50
+ "loss": 0.6944,
51
+ "step": 300
52
+ },
53
+ {
54
+ "epoch": 0.05884330867518493,
55
+ "grad_norm": 0.7868759036064148,
56
+ "learning_rate": 5.911163654788042e-08,
57
+ "loss": 0.6974,
58
+ "step": 350
59
+ },
60
+ {
61
+ "epoch": 0.06724949562878278,
62
+ "grad_norm": 0.7998577356338501,
63
+ "learning_rate": 6.755615605472048e-08,
64
+ "loss": 0.696,
65
+ "step": 400
66
+ },
67
+ {
68
+ "epoch": 0.07565568258238063,
69
+ "grad_norm": 0.8735635280609131,
70
+ "learning_rate": 7.600067556156055e-08,
71
+ "loss": 0.695,
72
+ "step": 450
73
+ },
74
+ {
75
+ "epoch": 0.08406186953597848,
76
+ "grad_norm": 0.824541449546814,
77
+ "learning_rate": 8.44451950684006e-08,
78
+ "loss": 0.6972,
79
+ "step": 500
80
+ },
81
+ {
82
+ "epoch": 0.09246805648957633,
83
+ "grad_norm": 0.9033031463623047,
84
+ "learning_rate": 9.288971457524067e-08,
85
+ "loss": 0.6973,
86
+ "step": 550
87
+ },
88
+ {
89
+ "epoch": 0.10087424344317418,
90
+ "grad_norm": 0.923117995262146,
91
+ "learning_rate": 1.0133423408208072e-07,
92
+ "loss": 0.6959,
93
+ "step": 600
94
+ },
95
+ {
96
+ "epoch": 0.10928043039677203,
97
+ "grad_norm": 0.8658111095428467,
98
+ "learning_rate": 1.0977875358892079e-07,
99
+ "loss": 0.6967,
100
+ "step": 650
101
+ },
102
+ {
103
+ "epoch": 0.11768661735036987,
104
+ "grad_norm": 1.0212489366531372,
105
+ "learning_rate": 1.1822327309576084e-07,
106
+ "loss": 0.6961,
107
+ "step": 700
108
+ },
109
+ {
110
+ "epoch": 0.12609280430396771,
111
+ "grad_norm": 0.8621204495429993,
112
+ "learning_rate": 1.266677926026009e-07,
113
+ "loss": 0.6964,
114
+ "step": 750
115
+ },
116
+ {
117
+ "epoch": 0.13449899125756556,
118
+ "grad_norm": 0.8873909711837769,
119
+ "learning_rate": 1.3511231210944096e-07,
120
+ "loss": 0.693,
121
+ "step": 800
122
+ },
123
+ {
124
+ "epoch": 0.14290517821116341,
125
+ "grad_norm": 0.7807337641716003,
126
+ "learning_rate": 1.4355683161628102e-07,
127
+ "loss": 0.6945,
128
+ "step": 850
129
+ },
130
+ {
131
+ "epoch": 0.15131136516476126,
132
+ "grad_norm": 0.8349995017051697,
133
+ "learning_rate": 1.520013511231211e-07,
134
+ "loss": 0.6956,
135
+ "step": 900
136
+ },
137
+ {
138
+ "epoch": 0.1597175521183591,
139
+ "grad_norm": 0.8659362196922302,
140
+ "learning_rate": 1.6044587062996115e-07,
141
+ "loss": 0.6931,
142
+ "step": 950
143
+ },
144
+ {
145
+ "epoch": 0.16812373907195696,
146
+ "grad_norm": 0.888361394405365,
147
+ "learning_rate": 1.688903901368012e-07,
148
+ "loss": 0.697,
149
+ "step": 1000
150
+ },
151
+ {
152
+ "epoch": 0.1765299260255548,
153
+ "grad_norm": 0.8722349405288696,
154
+ "learning_rate": 1.7733490964364126e-07,
155
+ "loss": 0.6939,
156
+ "step": 1050
157
+ },
158
+ {
159
+ "epoch": 0.18493611297915266,
160
+ "grad_norm": 0.8624858856201172,
161
+ "learning_rate": 1.8577942915048134e-07,
162
+ "loss": 0.6944,
163
+ "step": 1100
164
+ },
165
+ {
166
+ "epoch": 0.1933422999327505,
167
+ "grad_norm": 0.846449613571167,
168
+ "learning_rate": 1.942239486573214e-07,
169
+ "loss": 0.6933,
170
+ "step": 1150
171
+ },
172
+ {
173
+ "epoch": 0.20174848688634836,
174
+ "grad_norm": 0.7982395887374878,
175
+ "learning_rate": 2.0266846816416145e-07,
176
+ "loss": 0.6928,
177
+ "step": 1200
178
+ },
179
+ {
180
+ "epoch": 0.2101546738399462,
181
+ "grad_norm": 0.8489773273468018,
182
+ "learning_rate": 2.111129876710015e-07,
183
+ "loss": 0.693,
184
+ "step": 1250
185
+ },
186
+ {
187
+ "epoch": 0.21856086079354406,
188
+ "grad_norm": 0.8371855616569519,
189
+ "learning_rate": 2.1955750717784158e-07,
190
+ "loss": 0.6945,
191
+ "step": 1300
192
+ },
193
+ {
194
+ "epoch": 0.2269670477471419,
195
+ "grad_norm": 0.7753377556800842,
196
+ "learning_rate": 2.2800202668468163e-07,
197
+ "loss": 0.689,
198
+ "step": 1350
199
+ },
200
+ {
201
+ "epoch": 0.23537323470073973,
202
+ "grad_norm": 0.8692731261253357,
203
+ "learning_rate": 2.364465461915217e-07,
204
+ "loss": 0.6912,
205
+ "step": 1400
206
+ },
207
+ {
208
+ "epoch": 0.24377942165433758,
209
+ "grad_norm": 0.8294754028320312,
210
+ "learning_rate": 2.4489106569836174e-07,
211
+ "loss": 0.6956,
212
+ "step": 1450
213
+ },
214
+ {
215
+ "epoch": 0.25218560860793543,
216
+ "grad_norm": 0.8770684003829956,
217
+ "learning_rate": 2.533355852052018e-07,
218
+ "loss": 0.6913,
219
+ "step": 1500
220
+ },
221
+ {
222
+ "epoch": 0.2605917955615333,
223
+ "grad_norm": 0.8920065760612488,
224
+ "learning_rate": 2.6178010471204185e-07,
225
+ "loss": 0.6929,
226
+ "step": 1550
227
+ },
228
+ {
229
+ "epoch": 0.26899798251513113,
230
+ "grad_norm": 0.8197062015533447,
231
+ "learning_rate": 2.7022462421888193e-07,
232
+ "loss": 0.6891,
233
+ "step": 1600
234
+ },
235
+ {
236
+ "epoch": 0.277404169468729,
237
+ "grad_norm": 0.8367999196052551,
238
+ "learning_rate": 2.7866914372572196e-07,
239
+ "loss": 0.6929,
240
+ "step": 1650
241
+ },
242
+ {
243
+ "epoch": 0.28581035642232683,
244
+ "grad_norm": 0.7874473929405212,
245
+ "learning_rate": 2.8711366323256204e-07,
246
+ "loss": 0.6946,
247
+ "step": 1700
248
+ },
249
+ {
250
+ "epoch": 0.2942165433759247,
251
+ "grad_norm": 0.8376381397247314,
252
+ "learning_rate": 2.955581827394021e-07,
253
+ "loss": 0.6917,
254
+ "step": 1750
255
+ },
256
+ {
257
+ "epoch": 0.3026227303295225,
258
+ "grad_norm": 0.8306110501289368,
259
+ "learning_rate": 3.040027022462422e-07,
260
+ "loss": 0.689,
261
+ "step": 1800
262
+ },
263
+ {
264
+ "epoch": 0.3110289172831204,
265
+ "grad_norm": 0.8515573740005493,
266
+ "learning_rate": 3.124472217530822e-07,
267
+ "loss": 0.6919,
268
+ "step": 1850
269
+ },
270
+ {
271
+ "epoch": 0.3194351042367182,
272
+ "grad_norm": 0.8318465352058411,
273
+ "learning_rate": 3.208917412599223e-07,
274
+ "loss": 0.6925,
275
+ "step": 1900
276
+ },
277
+ {
278
+ "epoch": 0.3278412911903161,
279
+ "grad_norm": 1.0014280080795288,
280
+ "learning_rate": 3.2933626076676233e-07,
281
+ "loss": 0.6888,
282
+ "step": 1950
283
+ },
284
+ {
285
+ "epoch": 0.3362474781439139,
286
+ "grad_norm": 0.9011471271514893,
287
+ "learning_rate": 3.377807802736024e-07,
288
+ "loss": 0.6891,
289
+ "step": 2000
290
+ },
291
+ {
292
+ "epoch": 0.3446536650975118,
293
+ "grad_norm": 0.8618181347846985,
294
+ "learning_rate": 3.4622529978044244e-07,
295
+ "loss": 0.6871,
296
+ "step": 2050
297
+ },
298
+ {
299
+ "epoch": 0.3530598520511096,
300
+ "grad_norm": 0.834791898727417,
301
+ "learning_rate": 3.546698192872825e-07,
302
+ "loss": 0.6896,
303
+ "step": 2100
304
+ },
305
+ {
306
+ "epoch": 0.3614660390047075,
307
+ "grad_norm": 0.8285929560661316,
308
+ "learning_rate": 3.631143387941226e-07,
309
+ "loss": 0.6902,
310
+ "step": 2150
311
+ },
312
+ {
313
+ "epoch": 0.3698722259583053,
314
+ "grad_norm": 0.8645830154418945,
315
+ "learning_rate": 3.715588583009627e-07,
316
+ "loss": 0.6876,
317
+ "step": 2200
318
+ },
319
+ {
320
+ "epoch": 0.3782784129119032,
321
+ "grad_norm": 0.8188508152961731,
322
+ "learning_rate": 3.800033778078027e-07,
323
+ "loss": 0.6896,
324
+ "step": 2250
325
+ },
326
+ {
327
+ "epoch": 0.386684599865501,
328
+ "grad_norm": 0.8208332061767578,
329
+ "learning_rate": 3.884478973146428e-07,
330
+ "loss": 0.6857,
331
+ "step": 2300
332
+ },
333
+ {
334
+ "epoch": 0.39509078681909887,
335
+ "grad_norm": 0.8408875465393066,
336
+ "learning_rate": 3.968924168214828e-07,
337
+ "loss": 0.6894,
338
+ "step": 2350
339
+ },
340
+ {
341
+ "epoch": 0.4034969737726967,
342
+ "grad_norm": 0.8612744808197021,
343
+ "learning_rate": 4.053369363283229e-07,
344
+ "loss": 0.6872,
345
+ "step": 2400
346
+ },
347
+ {
348
+ "epoch": 0.41190316072629457,
349
+ "grad_norm": 0.9161643385887146,
350
+ "learning_rate": 4.137814558351629e-07,
351
+ "loss": 0.6874,
352
+ "step": 2450
353
+ },
354
+ {
355
+ "epoch": 0.4203093476798924,
356
+ "grad_norm": 0.8540963530540466,
357
+ "learning_rate": 4.22225975342003e-07,
358
+ "loss": 0.6888,
359
+ "step": 2500
360
+ },
361
+ {
362
+ "epoch": 0.42871553463349027,
363
+ "grad_norm": 0.9273486137390137,
364
+ "learning_rate": 4.306704948488431e-07,
365
+ "loss": 0.6905,
366
+ "step": 2550
367
+ },
368
+ {
369
+ "epoch": 0.4371217215870881,
370
+ "grad_norm": 0.7874332666397095,
371
+ "learning_rate": 4.3911501435568316e-07,
372
+ "loss": 0.6863,
373
+ "step": 2600
374
+ },
375
+ {
376
+ "epoch": 0.44552790854068597,
377
+ "grad_norm": 0.83612459897995,
378
+ "learning_rate": 4.475595338625232e-07,
379
+ "loss": 0.6877,
380
+ "step": 2650
381
+ },
382
+ {
383
+ "epoch": 0.4539340954942838,
384
+ "grad_norm": 0.9220020771026611,
385
+ "learning_rate": 4.5600405336936327e-07,
386
+ "loss": 0.6878,
387
+ "step": 2700
388
+ },
389
+ {
390
+ "epoch": 0.46234028244788167,
391
+ "grad_norm": 0.9280387759208679,
392
+ "learning_rate": 4.644485728762033e-07,
393
+ "loss": 0.682,
394
+ "step": 2750
395
+ },
396
+ {
397
+ "epoch": 0.47074646940147946,
398
+ "grad_norm": 0.8182604312896729,
399
+ "learning_rate": 4.728930923830434e-07,
400
+ "loss": 0.6835,
401
+ "step": 2800
402
+ },
403
+ {
404
+ "epoch": 0.4791526563550773,
405
+ "grad_norm": 0.7758776545524597,
406
+ "learning_rate": 4.813376118898834e-07,
407
+ "loss": 0.6811,
408
+ "step": 2850
409
+ },
410
+ {
411
+ "epoch": 0.48755884330867516,
412
+ "grad_norm": 0.9325575232505798,
413
+ "learning_rate": 4.897821313967235e-07,
414
+ "loss": 0.6806,
415
+ "step": 2900
416
+ },
417
+ {
418
+ "epoch": 0.495965030262273,
419
+ "grad_norm": 0.8952744007110596,
420
+ "learning_rate": 4.982266509035636e-07,
421
+ "loss": 0.6851,
422
+ "step": 2950
423
+ },
424
+ {
425
+ "epoch": 0.4976462676529926,
426
+ "eval_accuracy": 0.5783783783783784,
427
+ "eval_f1_macro": 0.577083139496245,
428
+ "eval_loss": 0.6811313629150391,
429
+ "eval_precision": 0.5781996700999453,
430
+ "eval_recall": 0.5791102231359357,
431
+ "eval_runtime": 48.5841,
432
+ "eval_samples_per_second": 79.964,
433
+ "eval_steps_per_second": 10.003,
434
+ "step": 2960
435
+ },
436
+ {
437
+ "epoch": 0.5043712172158709,
438
+ "grad_norm": 0.8315622210502625,
439
+ "learning_rate": 5.066711704104036e-07,
440
+ "loss": 0.6814,
441
+ "step": 3000
442
+ },
443
+ {
444
+ "epoch": 0.5127774041694687,
445
+ "grad_norm": 0.9562221169471741,
446
+ "learning_rate": 5.151156899172436e-07,
447
+ "loss": 0.6823,
448
+ "step": 3050
449
+ },
450
+ {
451
+ "epoch": 0.5211835911230666,
452
+ "grad_norm": 0.8448233008384705,
453
+ "learning_rate": 5.235602094240837e-07,
454
+ "loss": 0.683,
455
+ "step": 3100
456
+ },
457
+ {
458
+ "epoch": 0.5295897780766644,
459
+ "grad_norm": 0.848648726940155,
460
+ "learning_rate": 5.320047289309239e-07,
461
+ "loss": 0.6798,
462
+ "step": 3150
463
+ },
464
+ {
465
+ "epoch": 0.5379959650302623,
466
+ "grad_norm": 0.8812288641929626,
467
+ "learning_rate": 5.404492484377639e-07,
468
+ "loss": 0.6822,
469
+ "step": 3200
470
+ },
471
+ {
472
+ "epoch": 0.5464021519838601,
473
+ "grad_norm": 0.8234968781471252,
474
+ "learning_rate": 5.488937679446039e-07,
475
+ "loss": 0.6793,
476
+ "step": 3250
477
+ },
478
+ {
479
+ "epoch": 0.554808338937458,
480
+ "grad_norm": 0.9725661277770996,
481
+ "learning_rate": 5.573382874514439e-07,
482
+ "loss": 0.6784,
483
+ "step": 3300
484
+ },
485
+ {
486
+ "epoch": 0.5632145258910558,
487
+ "grad_norm": 0.8935129046440125,
488
+ "learning_rate": 5.657828069582841e-07,
489
+ "loss": 0.6756,
490
+ "step": 3350
491
+ },
492
+ {
493
+ "epoch": 0.5716207128446537,
494
+ "grad_norm": 0.8144574761390686,
495
+ "learning_rate": 5.742273264651241e-07,
496
+ "loss": 0.6759,
497
+ "step": 3400
498
+ },
499
+ {
500
+ "epoch": 0.5800268997982515,
501
+ "grad_norm": 0.8048451542854309,
502
+ "learning_rate": 5.826718459719642e-07,
503
+ "loss": 0.6756,
504
+ "step": 3450
505
+ },
506
+ {
507
+ "epoch": 0.5884330867518494,
508
+ "grad_norm": 0.9894602298736572,
509
+ "learning_rate": 5.911163654788042e-07,
510
+ "loss": 0.672,
511
+ "step": 3500
512
+ },
513
+ {
514
+ "epoch": 0.5968392737054472,
515
+ "grad_norm": 0.8449563980102539,
516
+ "learning_rate": 5.995608849856443e-07,
517
+ "loss": 0.6751,
518
+ "step": 3550
519
+ },
520
+ {
521
+ "epoch": 0.605245460659045,
522
+ "grad_norm": 0.8340707421302795,
523
+ "learning_rate": 6.080054044924844e-07,
524
+ "loss": 0.6739,
525
+ "step": 3600
526
+ },
527
+ {
528
+ "epoch": 0.6136516476126429,
529
+ "grad_norm": 0.8228305578231812,
530
+ "learning_rate": 6.164499239993244e-07,
531
+ "loss": 0.6733,
532
+ "step": 3650
533
+ },
534
+ {
535
+ "epoch": 0.6220578345662408,
536
+ "grad_norm": 0.7999180555343628,
537
+ "learning_rate": 6.248944435061644e-07,
538
+ "loss": 0.6712,
539
+ "step": 3700
540
+ },
541
+ {
542
+ "epoch": 0.6304640215198386,
543
+ "grad_norm": 0.7966693639755249,
544
+ "learning_rate": 6.333389630130045e-07,
545
+ "loss": 0.6705,
546
+ "step": 3750
547
+ },
548
+ {
549
+ "epoch": 0.6388702084734365,
550
+ "grad_norm": 0.978801429271698,
551
+ "learning_rate": 6.417834825198446e-07,
552
+ "loss": 0.668,
553
+ "step": 3800
554
+ },
555
+ {
556
+ "epoch": 0.6472763954270343,
557
+ "grad_norm": 0.8549861311912537,
558
+ "learning_rate": 6.502280020266847e-07,
559
+ "loss": 0.6707,
560
+ "step": 3850
561
+ },
562
+ {
563
+ "epoch": 0.6556825823806322,
564
+ "grad_norm": 0.8314603567123413,
565
+ "learning_rate": 6.586725215335247e-07,
566
+ "loss": 0.6687,
567
+ "step": 3900
568
+ },
569
+ {
570
+ "epoch": 0.66408876933423,
571
+ "grad_norm": 0.9379679560661316,
572
+ "learning_rate": 6.671170410403648e-07,
573
+ "loss": 0.6678,
574
+ "step": 3950
575
+ },
576
+ {
577
+ "epoch": 0.6724949562878278,
578
+ "grad_norm": 0.8202654719352722,
579
+ "learning_rate": 6.755615605472048e-07,
580
+ "loss": 0.6703,
581
+ "step": 4000
582
+ },
583
+ {
584
+ "epoch": 0.6809011432414257,
585
+ "grad_norm": 0.8476347327232361,
586
+ "learning_rate": 6.840060800540449e-07,
587
+ "loss": 0.6658,
588
+ "step": 4050
589
+ },
590
+ {
591
+ "epoch": 0.6893073301950235,
592
+ "grad_norm": 0.8811922669410706,
593
+ "learning_rate": 6.924505995608849e-07,
594
+ "loss": 0.6619,
595
+ "step": 4100
596
+ },
597
+ {
598
+ "epoch": 0.6977135171486214,
599
+ "grad_norm": 0.8713632225990295,
600
+ "learning_rate": 7.008951190677251e-07,
601
+ "loss": 0.663,
602
+ "step": 4150
603
+ },
604
+ {
605
+ "epoch": 0.7061197041022192,
606
+ "grad_norm": 0.9455136656761169,
607
+ "learning_rate": 7.09339638574565e-07,
608
+ "loss": 0.6621,
609
+ "step": 4200
610
+ },
611
+ {
612
+ "epoch": 0.7145258910558171,
613
+ "grad_norm": 0.7871012687683105,
614
+ "learning_rate": 7.177841580814051e-07,
615
+ "loss": 0.663,
616
+ "step": 4250
617
+ },
618
+ {
619
+ "epoch": 0.722932078009415,
620
+ "grad_norm": 0.8988749980926514,
621
+ "learning_rate": 7.262286775882452e-07,
622
+ "loss": 0.6567,
623
+ "step": 4300
624
+ },
625
+ {
626
+ "epoch": 0.7313382649630128,
627
+ "grad_norm": 0.8420932292938232,
628
+ "learning_rate": 7.346731970950853e-07,
629
+ "loss": 0.6591,
630
+ "step": 4350
631
+ },
632
+ {
633
+ "epoch": 0.7397444519166106,
634
+ "grad_norm": 0.902152955532074,
635
+ "learning_rate": 7.431177166019254e-07,
636
+ "loss": 0.6549,
637
+ "step": 4400
638
+ },
639
+ {
640
+ "epoch": 0.7481506388702085,
641
+ "grad_norm": 0.8646785020828247,
642
+ "learning_rate": 7.515622361087653e-07,
643
+ "loss": 0.6544,
644
+ "step": 4450
645
+ },
646
+ {
647
+ "epoch": 0.7565568258238063,
648
+ "grad_norm": 0.9260663986206055,
649
+ "learning_rate": 7.600067556156054e-07,
650
+ "loss": 0.6503,
651
+ "step": 4500
652
+ },
653
+ {
654
+ "epoch": 0.7649630127774042,
655
+ "grad_norm": 0.9004981517791748,
656
+ "learning_rate": 7.684512751224455e-07,
657
+ "loss": 0.6488,
658
+ "step": 4550
659
+ },
660
+ {
661
+ "epoch": 0.773369199731002,
662
+ "grad_norm": 0.8738273978233337,
663
+ "learning_rate": 7.768957946292856e-07,
664
+ "loss": 0.6515,
665
+ "step": 4600
666
+ },
667
+ {
668
+ "epoch": 0.7817753866845999,
669
+ "grad_norm": 0.8965081572532654,
670
+ "learning_rate": 7.853403141361256e-07,
671
+ "loss": 0.6495,
672
+ "step": 4650
673
+ },
674
+ {
675
+ "epoch": 0.7901815736381977,
676
+ "grad_norm": 0.982996940612793,
677
+ "learning_rate": 7.937848336429656e-07,
678
+ "loss": 0.6415,
679
+ "step": 4700
680
+ },
681
+ {
682
+ "epoch": 0.7985877605917956,
683
+ "grad_norm": 0.86506187915802,
684
+ "learning_rate": 8.022293531498058e-07,
685
+ "loss": 0.6471,
686
+ "step": 4750
687
+ },
688
+ {
689
+ "epoch": 0.8069939475453934,
690
+ "grad_norm": 0.9447119235992432,
691
+ "learning_rate": 8.106738726566458e-07,
692
+ "loss": 0.643,
693
+ "step": 4800
694
+ },
695
+ {
696
+ "epoch": 0.8154001344989913,
697
+ "grad_norm": 0.9167577624320984,
698
+ "learning_rate": 8.191183921634859e-07,
699
+ "loss": 0.6352,
700
+ "step": 4850
701
+ },
702
+ {
703
+ "epoch": 0.8238063214525891,
704
+ "grad_norm": 0.9060333967208862,
705
+ "learning_rate": 8.275629116703258e-07,
706
+ "loss": 0.6345,
707
+ "step": 4900
708
+ },
709
+ {
710
+ "epoch": 0.832212508406187,
711
+ "grad_norm": 0.9279811978340149,
712
+ "learning_rate": 8.36007431177166e-07,
713
+ "loss": 0.6393,
714
+ "step": 4950
715
+ },
716
+ {
717
+ "epoch": 0.8406186953597848,
718
+ "grad_norm": 0.8549557328224182,
719
+ "learning_rate": 8.44451950684006e-07,
720
+ "loss": 0.6327,
721
+ "step": 5000
722
+ },
723
+ {
724
+ "epoch": 0.8490248823133827,
725
+ "grad_norm": 1.002295732498169,
726
+ "learning_rate": 8.528964701908461e-07,
727
+ "loss": 0.6261,
728
+ "step": 5050
729
+ },
730
+ {
731
+ "epoch": 0.8574310692669805,
732
+ "grad_norm": 1.0114225149154663,
733
+ "learning_rate": 8.613409896976862e-07,
734
+ "loss": 0.6294,
735
+ "step": 5100
736
+ },
737
+ {
738
+ "epoch": 0.8658372562205784,
739
+ "grad_norm": 0.8996625542640686,
740
+ "learning_rate": 8.697855092045262e-07,
741
+ "loss": 0.6232,
742
+ "step": 5150
743
+ },
744
+ {
745
+ "epoch": 0.8742434431741762,
746
+ "grad_norm": 1.0029444694519043,
747
+ "learning_rate": 8.782300287113663e-07,
748
+ "loss": 0.6174,
749
+ "step": 5200
750
+ },
751
+ {
752
+ "epoch": 0.8826496301277741,
753
+ "grad_norm": 1.0238999128341675,
754
+ "learning_rate": 8.866745482182063e-07,
755
+ "loss": 0.6166,
756
+ "step": 5250
757
+ },
758
+ {
759
+ "epoch": 0.8910558170813719,
760
+ "grad_norm": 0.8762253522872925,
761
+ "learning_rate": 8.951190677250464e-07,
762
+ "loss": 0.6126,
763
+ "step": 5300
764
+ },
765
+ {
766
+ "epoch": 0.8994620040349698,
767
+ "grad_norm": 1.0304447412490845,
768
+ "learning_rate": 9.035635872318865e-07,
769
+ "loss": 0.6138,
770
+ "step": 5350
771
+ },
772
+ {
773
+ "epoch": 0.9078681909885676,
774
+ "grad_norm": 0.9571165442466736,
775
+ "learning_rate": 9.120081067387265e-07,
776
+ "loss": 0.6088,
777
+ "step": 5400
778
+ },
779
+ {
780
+ "epoch": 0.9162743779421655,
781
+ "grad_norm": 0.9421870708465576,
782
+ "learning_rate": 9.204526262455666e-07,
783
+ "loss": 0.599,
784
+ "step": 5450
785
+ },
786
+ {
787
+ "epoch": 0.9246805648957633,
788
+ "grad_norm": 1.0978301763534546,
789
+ "learning_rate": 9.288971457524066e-07,
790
+ "loss": 0.599,
791
+ "step": 5500
792
+ },
793
+ {
794
+ "epoch": 0.9330867518493612,
795
+ "grad_norm": 0.9784364104270935,
796
+ "learning_rate": 9.373416652592468e-07,
797
+ "loss": 0.5962,
798
+ "step": 5550
799
+ },
800
+ {
801
+ "epoch": 0.9414929388029589,
802
+ "grad_norm": 1.0564428567886353,
803
+ "learning_rate": 9.457861847660867e-07,
804
+ "loss": 0.5941,
805
+ "step": 5600
806
+ },
807
+ {
808
+ "epoch": 0.9498991257565568,
809
+ "grad_norm": 1.0474001169204712,
810
+ "learning_rate": 9.542307042729268e-07,
811
+ "loss": 0.5885,
812
+ "step": 5650
813
+ },
814
+ {
815
+ "epoch": 0.9583053127101546,
816
+ "grad_norm": 0.9988018274307251,
817
+ "learning_rate": 9.626752237797668e-07,
818
+ "loss": 0.5812,
819
+ "step": 5700
820
+ },
821
+ {
822
+ "epoch": 0.9667114996637525,
823
+ "grad_norm": 1.0413827896118164,
824
+ "learning_rate": 9.71119743286607e-07,
825
+ "loss": 0.5786,
826
+ "step": 5750
827
+ },
828
+ {
829
+ "epoch": 0.9751176866173503,
830
+ "grad_norm": 0.9909247159957886,
831
+ "learning_rate": 9.79564262793447e-07,
832
+ "loss": 0.5716,
833
+ "step": 5800
834
+ },
835
+ {
836
+ "epoch": 0.9835238735709482,
837
+ "grad_norm": 1.0758306980133057,
838
+ "learning_rate": 9.880087823002871e-07,
839
+ "loss": 0.5659,
840
+ "step": 5850
841
+ },
842
+ {
843
+ "epoch": 0.991930060524546,
844
+ "grad_norm": 1.275826096534729,
845
+ "learning_rate": 9.964533018071271e-07,
846
+ "loss": 0.5626,
847
+ "step": 5900
848
+ },
849
+ {
850
+ "epoch": 0.9952925353059852,
851
+ "eval_accuracy": 0.896010296010296,
852
+ "eval_f1_macro": 0.8950428029885591,
853
+ "eval_loss": 0.5504117608070374,
854
+ "eval_precision": 0.8942931136930143,
855
+ "eval_recall": 0.8960280806062683,
856
+ "eval_runtime": 43.2959,
857
+ "eval_samples_per_second": 89.731,
858
+ "eval_steps_per_second": 11.225,
859
+ "step": 5920
860
+ },
861
+ {
862
+ "epoch": 1.0003362474781439,
863
+ "grad_norm": 1.089690089225769,
864
+ "learning_rate": 9.994585410481898e-07,
865
+ "loss": 0.5589,
866
+ "step": 5950
867
+ },
868
+ {
869
+ "epoch": 1.0087424344317417,
870
+ "grad_norm": 1.1122716665267944,
871
+ "learning_rate": 9.985249911312757e-07,
872
+ "loss": 0.5511,
873
+ "step": 6000
874
+ },
875
+ {
876
+ "epoch": 1.0171486213853396,
877
+ "grad_norm": 0.9658188223838806,
878
+ "learning_rate": 9.975914412143618e-07,
879
+ "loss": 0.5472,
880
+ "step": 6050
881
+ },
882
+ {
883
+ "epoch": 1.0255548083389374,
884
+ "grad_norm": 1.1076239347457886,
885
+ "learning_rate": 9.966578912974476e-07,
886
+ "loss": 0.5412,
887
+ "step": 6100
888
+ },
889
+ {
890
+ "epoch": 1.0339609952925353,
891
+ "grad_norm": 1.0050643682479858,
892
+ "learning_rate": 9.957243413805335e-07,
893
+ "loss": 0.54,
894
+ "step": 6150
895
+ },
896
+ {
897
+ "epoch": 1.0423671822461331,
898
+ "grad_norm": 1.0132673978805542,
899
+ "learning_rate": 9.947907914636196e-07,
900
+ "loss": 0.53,
901
+ "step": 6200
902
+ },
903
+ {
904
+ "epoch": 1.050773369199731,
905
+ "grad_norm": 1.1316572427749634,
906
+ "learning_rate": 9.938572415467055e-07,
907
+ "loss": 0.5202,
908
+ "step": 6250
909
+ },
910
+ {
911
+ "epoch": 1.0591795561533288,
912
+ "grad_norm": 1.0512237548828125,
913
+ "learning_rate": 9.929236916297914e-07,
914
+ "loss": 0.5176,
915
+ "step": 6300
916
+ },
917
+ {
918
+ "epoch": 1.0675857431069267,
919
+ "grad_norm": 1.1535313129425049,
920
+ "learning_rate": 9.919901417128772e-07,
921
+ "loss": 0.5136,
922
+ "step": 6350
923
+ },
924
+ {
925
+ "epoch": 1.0759919300605245,
926
+ "grad_norm": 1.1831073760986328,
927
+ "learning_rate": 9.910565917959633e-07,
928
+ "loss": 0.4998,
929
+ "step": 6400
930
+ },
931
+ {
932
+ "epoch": 1.0843981170141224,
933
+ "grad_norm": 1.0638761520385742,
934
+ "learning_rate": 9.901230418790492e-07,
935
+ "loss": 0.497,
936
+ "step": 6450
937
+ },
938
+ {
939
+ "epoch": 1.0928043039677202,
940
+ "grad_norm": 1.163748025894165,
941
+ "learning_rate": 9.891894919621353e-07,
942
+ "loss": 0.4922,
943
+ "step": 6500
944
+ },
945
+ {
946
+ "epoch": 1.101210490921318,
947
+ "grad_norm": 1.2004588842391968,
948
+ "learning_rate": 9.882559420452212e-07,
949
+ "loss": 0.4895,
950
+ "step": 6550
951
+ },
952
+ {
953
+ "epoch": 1.109616677874916,
954
+ "grad_norm": 1.098446011543274,
955
+ "learning_rate": 9.87322392128307e-07,
956
+ "loss": 0.4787,
957
+ "step": 6600
958
+ },
959
+ {
960
+ "epoch": 1.1180228648285138,
961
+ "grad_norm": 1.1584326028823853,
962
+ "learning_rate": 9.86388842211393e-07,
963
+ "loss": 0.4672,
964
+ "step": 6650
965
+ },
966
+ {
967
+ "epoch": 1.1264290517821116,
968
+ "grad_norm": 1.1275545358657837,
969
+ "learning_rate": 9.85455292294479e-07,
970
+ "loss": 0.4679,
971
+ "step": 6700
972
+ },
973
+ {
974
+ "epoch": 1.1348352387357095,
975
+ "grad_norm": 1.060410976409912,
976
+ "learning_rate": 9.845217423775649e-07,
977
+ "loss": 0.4581,
978
+ "step": 6750
979
+ },
980
+ {
981
+ "epoch": 1.1432414256893073,
982
+ "grad_norm": 1.1001862287521362,
983
+ "learning_rate": 9.835881924606508e-07,
984
+ "loss": 0.4562,
985
+ "step": 6800
986
+ },
987
+ {
988
+ "epoch": 1.1516476126429052,
989
+ "grad_norm": 1.188932180404663,
990
+ "learning_rate": 9.826546425437368e-07,
991
+ "loss": 0.4485,
992
+ "step": 6850
993
+ },
994
+ {
995
+ "epoch": 1.160053799596503,
996
+ "grad_norm": 1.1793055534362793,
997
+ "learning_rate": 9.817210926268227e-07,
998
+ "loss": 0.4478,
999
+ "step": 6900
1000
+ },
1001
+ {
1002
+ "epoch": 1.1684599865501009,
1003
+ "grad_norm": 1.2268601655960083,
1004
+ "learning_rate": 9.807875427099086e-07,
1005
+ "loss": 0.4321,
1006
+ "step": 6950
1007
+ },
1008
+ {
1009
+ "epoch": 1.1768661735036987,
1010
+ "grad_norm": 1.1301724910736084,
1011
+ "learning_rate": 9.798539927929947e-07,
1012
+ "loss": 0.4263,
1013
+ "step": 7000
1014
+ },
1015
+ {
1016
+ "epoch": 1.1852723604572966,
1017
+ "grad_norm": 1.115373134613037,
1018
+ "learning_rate": 9.789204428760806e-07,
1019
+ "loss": 0.4154,
1020
+ "step": 7050
1021
+ },
1022
+ {
1023
+ "epoch": 1.1936785474108944,
1024
+ "grad_norm": 1.180468201637268,
1025
+ "learning_rate": 9.779868929591664e-07,
1026
+ "loss": 0.4096,
1027
+ "step": 7100
1028
+ },
1029
+ {
1030
+ "epoch": 1.2020847343644923,
1031
+ "grad_norm": 1.213911533355713,
1032
+ "learning_rate": 9.770533430422523e-07,
1033
+ "loss": 0.4114,
1034
+ "step": 7150
1035
+ },
1036
+ {
1037
+ "epoch": 1.21049092131809,
1038
+ "grad_norm": 1.1898268461227417,
1039
+ "learning_rate": 9.761197931253384e-07,
1040
+ "loss": 0.3947,
1041
+ "step": 7200
1042
+ },
1043
+ {
1044
+ "epoch": 1.218897108271688,
1045
+ "grad_norm": 1.1514030694961548,
1046
+ "learning_rate": 9.751862432084243e-07,
1047
+ "loss": 0.3934,
1048
+ "step": 7250
1049
+ },
1050
+ {
1051
+ "epoch": 1.2273032952252858,
1052
+ "grad_norm": 1.1464074850082397,
1053
+ "learning_rate": 9.742526932915104e-07,
1054
+ "loss": 0.3822,
1055
+ "step": 7300
1056
+ },
1057
+ {
1058
+ "epoch": 1.2357094821788837,
1059
+ "grad_norm": 1.1897680759429932,
1060
+ "learning_rate": 9.733191433745962e-07,
1061
+ "loss": 0.3709,
1062
+ "step": 7350
1063
+ },
1064
+ {
1065
+ "epoch": 1.2441156691324815,
1066
+ "grad_norm": 1.1825538873672485,
1067
+ "learning_rate": 9.723855934576821e-07,
1068
+ "loss": 0.3695,
1069
+ "step": 7400
1070
+ },
1071
+ {
1072
+ "epoch": 1.2525218560860794,
1073
+ "grad_norm": 1.147258996963501,
1074
+ "learning_rate": 9.71452043540768e-07,
1075
+ "loss": 0.3603,
1076
+ "step": 7450
1077
+ },
1078
+ {
1079
+ "epoch": 1.2609280430396772,
1080
+ "grad_norm": 1.1026544570922852,
1081
+ "learning_rate": 9.70518493623854e-07,
1082
+ "loss": 0.3577,
1083
+ "step": 7500
1084
+ },
1085
+ {
1086
+ "epoch": 1.269334229993275,
1087
+ "grad_norm": 1.1945158243179321,
1088
+ "learning_rate": 9.6958494370694e-07,
1089
+ "loss": 0.3534,
1090
+ "step": 7550
1091
+ },
1092
+ {
1093
+ "epoch": 1.277740416946873,
1094
+ "grad_norm": 1.169705867767334,
1095
+ "learning_rate": 9.686513937900258e-07,
1096
+ "loss": 0.3448,
1097
+ "step": 7600
1098
+ },
1099
+ {
1100
+ "epoch": 1.2861466039004708,
1101
+ "grad_norm": 1.123337984085083,
1102
+ "learning_rate": 9.67717843873112e-07,
1103
+ "loss": 0.3431,
1104
+ "step": 7650
1105
+ },
1106
+ {
1107
+ "epoch": 1.2945527908540686,
1108
+ "grad_norm": 1.1492093801498413,
1109
+ "learning_rate": 9.667842939561978e-07,
1110
+ "loss": 0.3297,
1111
+ "step": 7700
1112
+ },
1113
+ {
1114
+ "epoch": 1.3029589778076665,
1115
+ "grad_norm": 0.9801061749458313,
1116
+ "learning_rate": 9.658507440392837e-07,
1117
+ "loss": 0.3255,
1118
+ "step": 7750
1119
+ },
1120
+ {
1121
+ "epoch": 1.3113651647612643,
1122
+ "grad_norm": 1.135079264640808,
1123
+ "learning_rate": 9.649171941223698e-07,
1124
+ "loss": 0.3241,
1125
+ "step": 7800
1126
+ },
1127
+ {
1128
+ "epoch": 1.3197713517148622,
1129
+ "grad_norm": 1.1149753332138062,
1130
+ "learning_rate": 9.639836442054556e-07,
1131
+ "loss": 0.3149,
1132
+ "step": 7850
1133
+ },
1134
+ {
1135
+ "epoch": 1.32817753866846,
1136
+ "grad_norm": 1.078354835510254,
1137
+ "learning_rate": 9.630500942885415e-07,
1138
+ "loss": 0.3143,
1139
+ "step": 7900
1140
+ },
1141
+ {
1142
+ "epoch": 1.3365837256220578,
1143
+ "grad_norm": 1.1154347658157349,
1144
+ "learning_rate": 9.621165443716274e-07,
1145
+ "loss": 0.301,
1146
+ "step": 7950
1147
+ },
1148
+ {
1149
+ "epoch": 1.3449899125756557,
1150
+ "grad_norm": 1.1075221300125122,
1151
+ "learning_rate": 9.611829944547135e-07,
1152
+ "loss": 0.2939,
1153
+ "step": 8000
1154
+ },
1155
+ {
1156
+ "epoch": 1.3533960995292535,
1157
+ "grad_norm": 1.0937137603759766,
1158
+ "learning_rate": 9.602494445377994e-07,
1159
+ "loss": 0.294,
1160
+ "step": 8050
1161
+ },
1162
+ {
1163
+ "epoch": 1.3618022864828514,
1164
+ "grad_norm": 1.0870780944824219,
1165
+ "learning_rate": 9.593158946208854e-07,
1166
+ "loss": 0.2864,
1167
+ "step": 8100
1168
+ },
1169
+ {
1170
+ "epoch": 1.3702084734364492,
1171
+ "grad_norm": 1.1460663080215454,
1172
+ "learning_rate": 9.583823447039713e-07,
1173
+ "loss": 0.274,
1174
+ "step": 8150
1175
+ },
1176
+ {
1177
+ "epoch": 1.378614660390047,
1178
+ "grad_norm": 1.0148537158966064,
1179
+ "learning_rate": 9.574487947870572e-07,
1180
+ "loss": 0.2727,
1181
+ "step": 8200
1182
+ },
1183
+ {
1184
+ "epoch": 1.387020847343645,
1185
+ "grad_norm": 1.0275702476501465,
1186
+ "learning_rate": 9.56515244870143e-07,
1187
+ "loss": 0.2613,
1188
+ "step": 8250
1189
+ },
1190
+ {
1191
+ "epoch": 1.3954270342972428,
1192
+ "grad_norm": 1.1357285976409912,
1193
+ "learning_rate": 9.555816949532292e-07,
1194
+ "loss": 0.2539,
1195
+ "step": 8300
1196
+ },
1197
+ {
1198
+ "epoch": 1.4038332212508406,
1199
+ "grad_norm": 1.0728079080581665,
1200
+ "learning_rate": 9.54648145036315e-07,
1201
+ "loss": 0.2538,
1202
+ "step": 8350
1203
+ },
1204
+ {
1205
+ "epoch": 1.4122394082044385,
1206
+ "grad_norm": 1.0374382734298706,
1207
+ "learning_rate": 9.53714595119401e-07,
1208
+ "loss": 0.2482,
1209
+ "step": 8400
1210
+ },
1211
+ {
1212
+ "epoch": 1.4206455951580363,
1213
+ "grad_norm": 1.0923174619674683,
1214
+ "learning_rate": 9.527810452024869e-07,
1215
+ "loss": 0.2392,
1216
+ "step": 8450
1217
+ },
1218
+ {
1219
+ "epoch": 1.4290517821116342,
1220
+ "grad_norm": 0.9895720481872559,
1221
+ "learning_rate": 9.518474952855729e-07,
1222
+ "loss": 0.238,
1223
+ "step": 8500
1224
+ },
1225
+ {
1226
+ "epoch": 1.437457969065232,
1227
+ "grad_norm": 1.0930739641189575,
1228
+ "learning_rate": 9.509139453686588e-07,
1229
+ "loss": 0.236,
1230
+ "step": 8550
1231
+ },
1232
+ {
1233
+ "epoch": 1.44586415601883,
1234
+ "grad_norm": 1.1996747255325317,
1235
+ "learning_rate": 9.499803954517448e-07,
1236
+ "loss": 0.2307,
1237
+ "step": 8600
1238
+ },
1239
+ {
1240
+ "epoch": 1.4542703429724277,
1241
+ "grad_norm": 0.921042263507843,
1242
+ "learning_rate": 9.490468455348307e-07,
1243
+ "loss": 0.2186,
1244
+ "step": 8650
1245
+ },
1246
+ {
1247
+ "epoch": 1.4626765299260256,
1248
+ "grad_norm": 0.9537034630775452,
1249
+ "learning_rate": 9.481132956179166e-07,
1250
+ "loss": 0.2251,
1251
+ "step": 8700
1252
+ },
1253
+ {
1254
+ "epoch": 1.4710827168796234,
1255
+ "grad_norm": 0.9689446091651917,
1256
+ "learning_rate": 9.471797457010026e-07,
1257
+ "loss": 0.2106,
1258
+ "step": 8750
1259
+ },
1260
+ {
1261
+ "epoch": 1.4794889038332213,
1262
+ "grad_norm": 0.9194216132164001,
1263
+ "learning_rate": 9.462461957840885e-07,
1264
+ "loss": 0.2043,
1265
+ "step": 8800
1266
+ },
1267
+ {
1268
+ "epoch": 1.4878950907868191,
1269
+ "grad_norm": 0.9571210145950317,
1270
+ "learning_rate": 9.453126458671745e-07,
1271
+ "loss": 0.2002,
1272
+ "step": 8850
1273
+ },
1274
+ {
1275
+ "epoch": 1.4929388029589779,
1276
+ "eval_accuracy": 0.9861003861003861,
1277
+ "eval_f1_macro": 0.985912214812837,
1278
+ "eval_loss": 0.18007032573223114,
1279
+ "eval_precision": 0.9869196757553427,
1280
+ "eval_recall": 0.9850439724663349,
1281
+ "eval_runtime": 42.4212,
1282
+ "eval_samples_per_second": 91.582,
1283
+ "eval_steps_per_second": 11.457,
1284
+ "step": 8880
1285
+ },
1286
+ {
1287
+ "epoch": 1.496301277740417,
1288
+ "grad_norm": 0.7686983346939087,
1289
+ "learning_rate": 9.443790959502604e-07,
1290
+ "loss": 0.1963,
1291
+ "step": 8900
1292
+ },
1293
+ {
1294
+ "epoch": 1.5047074646940148,
1295
+ "grad_norm": 0.8870165348052979,
1296
+ "learning_rate": 9.434455460333464e-07,
1297
+ "loss": 0.1893,
1298
+ "step": 8950
1299
+ },
1300
+ {
1301
+ "epoch": 1.5131136516476127,
1302
+ "grad_norm": 0.8588102459907532,
1303
+ "learning_rate": 9.425119961164323e-07,
1304
+ "loss": 0.1831,
1305
+ "step": 9000
1306
+ },
1307
+ {
1308
+ "epoch": 1.5215198386012105,
1309
+ "grad_norm": 1.1222652196884155,
1310
+ "learning_rate": 9.415784461995182e-07,
1311
+ "loss": 0.1823,
1312
+ "step": 9050
1313
+ },
1314
+ {
1315
+ "epoch": 1.5299260255548084,
1316
+ "grad_norm": 1.2664000988006592,
1317
+ "learning_rate": 9.406448962826042e-07,
1318
+ "loss": 0.1858,
1319
+ "step": 9100
1320
+ },
1321
+ {
1322
+ "epoch": 1.5383322125084062,
1323
+ "grad_norm": 0.8765432834625244,
1324
+ "learning_rate": 9.397113463656901e-07,
1325
+ "loss": 0.1734,
1326
+ "step": 9150
1327
+ },
1328
+ {
1329
+ "epoch": 1.546738399462004,
1330
+ "grad_norm": 0.9564241170883179,
1331
+ "learning_rate": 9.387777964487761e-07,
1332
+ "loss": 0.162,
1333
+ "step": 9200
1334
+ },
1335
+ {
1336
+ "epoch": 1.555144586415602,
1337
+ "grad_norm": 0.8159739971160889,
1338
+ "learning_rate": 9.37844246531862e-07,
1339
+ "loss": 0.1555,
1340
+ "step": 9250
1341
+ },
1342
+ {
1343
+ "epoch": 1.5635507733691996,
1344
+ "grad_norm": 1.0021076202392578,
1345
+ "learning_rate": 9.36910696614948e-07,
1346
+ "loss": 0.1663,
1347
+ "step": 9300
1348
+ },
1349
+ {
1350
+ "epoch": 1.5719569603227974,
1351
+ "grad_norm": 1.0942935943603516,
1352
+ "learning_rate": 9.359771466980338e-07,
1353
+ "loss": 0.1516,
1354
+ "step": 9350
1355
+ },
1356
+ {
1357
+ "epoch": 1.5803631472763953,
1358
+ "grad_norm": 0.9106407761573792,
1359
+ "learning_rate": 9.350435967811199e-07,
1360
+ "loss": 0.1462,
1361
+ "step": 9400
1362
+ },
1363
+ {
1364
+ "epoch": 1.5887693342299931,
1365
+ "grad_norm": 1.2029844522476196,
1366
+ "learning_rate": 9.341100468642058e-07,
1367
+ "loss": 0.1334,
1368
+ "step": 9450
1369
+ },
1370
+ {
1371
+ "epoch": 1.597175521183591,
1372
+ "grad_norm": 1.1830437183380127,
1373
+ "learning_rate": 9.331764969472917e-07,
1374
+ "loss": 0.1511,
1375
+ "step": 9500
1376
+ },
1377
+ {
1378
+ "epoch": 1.6055817081371888,
1379
+ "grad_norm": 1.024003267288208,
1380
+ "learning_rate": 9.322429470303777e-07,
1381
+ "loss": 0.137,
1382
+ "step": 9550
1383
+ },
1384
+ {
1385
+ "epoch": 1.6139878950907867,
1386
+ "grad_norm": 0.6936477422714233,
1387
+ "learning_rate": 9.313093971134635e-07,
1388
+ "loss": 0.1309,
1389
+ "step": 9600
1390
+ },
1391
+ {
1392
+ "epoch": 1.6223940820443845,
1393
+ "grad_norm": 0.9332903027534485,
1394
+ "learning_rate": 9.303758471965496e-07,
1395
+ "loss": 0.1332,
1396
+ "step": 9650
1397
+ },
1398
+ {
1399
+ "epoch": 1.6308002689979824,
1400
+ "grad_norm": 0.7311134338378906,
1401
+ "learning_rate": 9.294422972796355e-07,
1402
+ "loss": 0.1273,
1403
+ "step": 9700
1404
+ },
1405
+ {
1406
+ "epoch": 1.6392064559515802,
1407
+ "grad_norm": 0.897352397441864,
1408
+ "learning_rate": 9.285087473627215e-07,
1409
+ "loss": 0.1277,
1410
+ "step": 9750
1411
+ },
1412
+ {
1413
+ "epoch": 1.647612642905178,
1414
+ "grad_norm": 0.9234735369682312,
1415
+ "learning_rate": 9.275751974458074e-07,
1416
+ "loss": 0.1285,
1417
+ "step": 9800
1418
+ },
1419
+ {
1420
+ "epoch": 1.656018829858776,
1421
+ "grad_norm": 1.1619858741760254,
1422
+ "learning_rate": 9.266416475288933e-07,
1423
+ "loss": 0.1311,
1424
+ "step": 9850
1425
+ },
1426
+ {
1427
+ "epoch": 1.6644250168123738,
1428
+ "grad_norm": 0.6384488940238953,
1429
+ "learning_rate": 9.257080976119793e-07,
1430
+ "loss": 0.1165,
1431
+ "step": 9900
1432
+ },
1433
+ {
1434
+ "epoch": 1.6728312037659716,
1435
+ "grad_norm": 0.5864728689193726,
1436
+ "learning_rate": 9.247745476950652e-07,
1437
+ "loss": 0.1023,
1438
+ "step": 9950
1439
+ },
1440
+ {
1441
+ "epoch": 1.6812373907195695,
1442
+ "grad_norm": 1.1280192136764526,
1443
+ "learning_rate": 9.238409977781512e-07,
1444
+ "loss": 0.11,
1445
+ "step": 10000
1446
+ },
1447
+ {
1448
+ "epoch": 1.6896435776731673,
1449
+ "grad_norm": 0.5544498562812805,
1450
+ "learning_rate": 9.229074478612371e-07,
1451
+ "loss": 0.1006,
1452
+ "step": 10050
1453
+ },
1454
+ {
1455
+ "epoch": 1.6980497646267652,
1456
+ "grad_norm": 1.138969898223877,
1457
+ "learning_rate": 9.21973897944323e-07,
1458
+ "loss": 0.0998,
1459
+ "step": 10100
1460
+ },
1461
+ {
1462
+ "epoch": 1.706455951580363,
1463
+ "grad_norm": 0.6470067501068115,
1464
+ "learning_rate": 9.210403480274089e-07,
1465
+ "loss": 0.0967,
1466
+ "step": 10150
1467
+ },
1468
+ {
1469
+ "epoch": 1.7148621385339609,
1470
+ "grad_norm": 0.8513116240501404,
1471
+ "learning_rate": 9.20106798110495e-07,
1472
+ "loss": 0.0955,
1473
+ "step": 10200
1474
+ },
1475
+ {
1476
+ "epoch": 1.7232683254875587,
1477
+ "grad_norm": 1.3393694162368774,
1478
+ "learning_rate": 9.191732481935809e-07,
1479
+ "loss": 0.0884,
1480
+ "step": 10250
1481
+ },
1482
+ {
1483
+ "epoch": 1.7316745124411566,
1484
+ "grad_norm": 0.6969456076622009,
1485
+ "learning_rate": 9.182396982766669e-07,
1486
+ "loss": 0.1072,
1487
+ "step": 10300
1488
+ },
1489
+ {
1490
+ "epoch": 1.7400806993947544,
1491
+ "grad_norm": 0.5059545636177063,
1492
+ "learning_rate": 9.173061483597527e-07,
1493
+ "loss": 0.092,
1494
+ "step": 10350
1495
+ },
1496
+ {
1497
+ "epoch": 1.7484868863483523,
1498
+ "grad_norm": 0.7818930745124817,
1499
+ "learning_rate": 9.163725984428386e-07,
1500
+ "loss": 0.0852,
1501
+ "step": 10400
1502
+ },
1503
+ {
1504
+ "epoch": 1.75689307330195,
1505
+ "grad_norm": 0.7932471632957458,
1506
+ "learning_rate": 9.154390485259247e-07,
1507
+ "loss": 0.0774,
1508
+ "step": 10450
1509
+ },
1510
+ {
1511
+ "epoch": 1.765299260255548,
1512
+ "grad_norm": 0.9375746846199036,
1513
+ "learning_rate": 9.145054986090106e-07,
1514
+ "loss": 0.0913,
1515
+ "step": 10500
1516
+ },
1517
+ {
1518
+ "epoch": 1.7737054472091458,
1519
+ "grad_norm": 1.0718626976013184,
1520
+ "learning_rate": 9.135719486920966e-07,
1521
+ "loss": 0.0749,
1522
+ "step": 10550
1523
+ },
1524
+ {
1525
+ "epoch": 1.7821116341627437,
1526
+ "grad_norm": 1.1264452934265137,
1527
+ "learning_rate": 9.126383987751824e-07,
1528
+ "loss": 0.08,
1529
+ "step": 10600
1530
+ },
1531
+ {
1532
+ "epoch": 1.7905178211163415,
1533
+ "grad_norm": 0.40607750415802,
1534
+ "learning_rate": 9.117048488582684e-07,
1535
+ "loss": 0.0802,
1536
+ "step": 10650
1537
+ },
1538
+ {
1539
+ "epoch": 1.7989240080699394,
1540
+ "grad_norm": 0.9266816973686218,
1541
+ "learning_rate": 9.107712989413544e-07,
1542
+ "loss": 0.0702,
1543
+ "step": 10700
1544
+ },
1545
+ {
1546
+ "epoch": 1.8073301950235372,
1547
+ "grad_norm": 0.9419146180152893,
1548
+ "learning_rate": 9.098377490244403e-07,
1549
+ "loss": 0.0783,
1550
+ "step": 10750
1551
+ },
1552
+ {
1553
+ "epoch": 1.815736381977135,
1554
+ "grad_norm": 0.6174508333206177,
1555
+ "learning_rate": 9.089041991075263e-07,
1556
+ "loss": 0.0647,
1557
+ "step": 10800
1558
+ },
1559
+ {
1560
+ "epoch": 1.824142568930733,
1561
+ "grad_norm": 0.5224350690841675,
1562
+ "learning_rate": 9.079706491906121e-07,
1563
+ "loss": 0.0778,
1564
+ "step": 10850
1565
+ },
1566
+ {
1567
+ "epoch": 1.8325487558843307,
1568
+ "grad_norm": 0.6590968370437622,
1569
+ "learning_rate": 9.070370992736981e-07,
1570
+ "loss": 0.084,
1571
+ "step": 10900
1572
+ },
1573
+ {
1574
+ "epoch": 1.8409549428379286,
1575
+ "grad_norm": 0.5679555535316467,
1576
+ "learning_rate": 9.06103549356784e-07,
1577
+ "loss": 0.0684,
1578
+ "step": 10950
1579
+ },
1580
+ {
1581
+ "epoch": 1.8493611297915264,
1582
+ "grad_norm": 0.3743615746498108,
1583
+ "learning_rate": 9.051699994398701e-07,
1584
+ "loss": 0.0696,
1585
+ "step": 11000
1586
+ },
1587
+ {
1588
+ "epoch": 1.8577673167451243,
1589
+ "grad_norm": 0.7687448263168335,
1590
+ "learning_rate": 9.04236449522956e-07,
1591
+ "loss": 0.0666,
1592
+ "step": 11050
1593
+ },
1594
+ {
1595
+ "epoch": 1.8661735036987221,
1596
+ "grad_norm": 0.38414329290390015,
1597
+ "learning_rate": 9.033028996060419e-07,
1598
+ "loss": 0.066,
1599
+ "step": 11100
1600
+ },
1601
+ {
1602
+ "epoch": 1.87457969065232,
1603
+ "grad_norm": 0.33975300192832947,
1604
+ "learning_rate": 9.023693496891278e-07,
1605
+ "loss": 0.0727,
1606
+ "step": 11150
1607
+ },
1608
+ {
1609
+ "epoch": 1.8829858776059178,
1610
+ "grad_norm": 0.3808116018772125,
1611
+ "learning_rate": 9.014357997722137e-07,
1612
+ "loss": 0.0765,
1613
+ "step": 11200
1614
+ },
1615
+ {
1616
+ "epoch": 1.8913920645595157,
1617
+ "grad_norm": 0.3306853175163269,
1618
+ "learning_rate": 9.005022498552998e-07,
1619
+ "loss": 0.067,
1620
+ "step": 11250
1621
+ },
1622
+ {
1623
+ "epoch": 1.8997982515131135,
1624
+ "grad_norm": 0.5410125851631165,
1625
+ "learning_rate": 8.995686999383857e-07,
1626
+ "loss": 0.0642,
1627
+ "step": 11300
1628
+ },
1629
+ {
1630
+ "epoch": 1.9082044384667114,
1631
+ "grad_norm": 1.0305054187774658,
1632
+ "learning_rate": 8.986351500214716e-07,
1633
+ "loss": 0.05,
1634
+ "step": 11350
1635
+ },
1636
+ {
1637
+ "epoch": 1.9166106254203092,
1638
+ "grad_norm": 1.6848655939102173,
1639
+ "learning_rate": 8.977016001045575e-07,
1640
+ "loss": 0.0551,
1641
+ "step": 11400
1642
+ },
1643
+ {
1644
+ "epoch": 1.925016812373907,
1645
+ "grad_norm": 0.6348687410354614,
1646
+ "learning_rate": 8.967680501876435e-07,
1647
+ "loss": 0.0561,
1648
+ "step": 11450
1649
+ },
1650
+ {
1651
+ "epoch": 1.933422999327505,
1652
+ "grad_norm": 0.3955049216747284,
1653
+ "learning_rate": 8.958345002707295e-07,
1654
+ "loss": 0.0606,
1655
+ "step": 11500
1656
+ },
1657
+ {
1658
+ "epoch": 1.9418291862811028,
1659
+ "grad_norm": 1.238265872001648,
1660
+ "learning_rate": 8.949009503538155e-07,
1661
+ "loss": 0.0568,
1662
+ "step": 11550
1663
+ },
1664
+ {
1665
+ "epoch": 1.9502353732347006,
1666
+ "grad_norm": 1.927749752998352,
1667
+ "learning_rate": 8.939674004369013e-07,
1668
+ "loss": 0.0641,
1669
+ "step": 11600
1670
+ },
1671
+ {
1672
+ "epoch": 1.9586415601882985,
1673
+ "grad_norm": 0.2765219509601593,
1674
+ "learning_rate": 8.930338505199872e-07,
1675
+ "loss": 0.0613,
1676
+ "step": 11650
1677
+ },
1678
+ {
1679
+ "epoch": 1.9670477471418963,
1680
+ "grad_norm": 1.0269273519515991,
1681
+ "learning_rate": 8.921003006030732e-07,
1682
+ "loss": 0.0641,
1683
+ "step": 11700
1684
+ },
1685
+ {
1686
+ "epoch": 1.9754539340954942,
1687
+ "grad_norm": 0.43984729051589966,
1688
+ "learning_rate": 8.911667506861591e-07,
1689
+ "loss": 0.0605,
1690
+ "step": 11750
1691
+ },
1692
+ {
1693
+ "epoch": 1.983860121049092,
1694
+ "grad_norm": 0.6223533153533936,
1695
+ "learning_rate": 8.902332007692452e-07,
1696
+ "loss": 0.053,
1697
+ "step": 11800
1698
+ },
1699
+ {
1700
+ "epoch": 1.9905850706119703,
1701
+ "eval_accuracy": 0.9889317889317889,
1702
+ "eval_f1_macro": 0.9887785465935663,
1703
+ "eval_loss": 0.04083893820643425,
1704
+ "eval_precision": 0.989975319388231,
1705
+ "eval_recall": 0.9877686245554091,
1706
+ "eval_runtime": 43.4219,
1707
+ "eval_samples_per_second": 89.471,
1708
+ "eval_steps_per_second": 11.193,
1709
+ "step": 11840
1710
+ },
1711
+ {
1712
+ "epoch": 1.99226630800269,
1713
+ "grad_norm": 0.1801561415195465,
1714
+ "learning_rate": 8.89299650852331e-07,
1715
+ "loss": 0.0457,
1716
+ "step": 11850
1717
+ },
1718
+ {
1719
+ "epoch": 2.0006724949562877,
1720
+ "grad_norm": 1.5527310371398926,
1721
+ "learning_rate": 8.88366100935417e-07,
1722
+ "loss": 0.0628,
1723
+ "step": 11900
1724
+ },
1725
+ {
1726
+ "epoch": 2.0090786819098856,
1727
+ "grad_norm": 1.5352983474731445,
1728
+ "learning_rate": 8.874325510185029e-07,
1729
+ "loss": 0.0528,
1730
+ "step": 11950
1731
+ },
1732
+ {
1733
+ "epoch": 2.0174848688634834,
1734
+ "grad_norm": 2.4047458171844482,
1735
+ "learning_rate": 8.864990011015888e-07,
1736
+ "loss": 0.0597,
1737
+ "step": 12000
1738
+ },
1739
+ {
1740
+ "epoch": 2.0258910558170813,
1741
+ "grad_norm": 0.8849202990531921,
1742
+ "learning_rate": 8.855654511846749e-07,
1743
+ "loss": 0.0524,
1744
+ "step": 12050
1745
+ },
1746
+ {
1747
+ "epoch": 2.034297242770679,
1748
+ "grad_norm": 0.38976189494132996,
1749
+ "learning_rate": 8.846319012677607e-07,
1750
+ "loss": 0.0406,
1751
+ "step": 12100
1752
+ },
1753
+ {
1754
+ "epoch": 2.042703429724277,
1755
+ "grad_norm": 0.31227657198905945,
1756
+ "learning_rate": 8.836983513508467e-07,
1757
+ "loss": 0.0467,
1758
+ "step": 12150
1759
+ },
1760
+ {
1761
+ "epoch": 2.051109616677875,
1762
+ "grad_norm": 1.8127578496932983,
1763
+ "learning_rate": 8.827648014339326e-07,
1764
+ "loss": 0.0689,
1765
+ "step": 12200
1766
+ },
1767
+ {
1768
+ "epoch": 2.0595158036314727,
1769
+ "grad_norm": 0.23175351321697235,
1770
+ "learning_rate": 8.818312515170186e-07,
1771
+ "loss": 0.0423,
1772
+ "step": 12250
1773
+ },
1774
+ {
1775
+ "epoch": 2.0679219905850705,
1776
+ "grad_norm": 1.7162238359451294,
1777
+ "learning_rate": 8.808977016001046e-07,
1778
+ "loss": 0.0372,
1779
+ "step": 12300
1780
+ },
1781
+ {
1782
+ "epoch": 2.0763281775386684,
1783
+ "grad_norm": 0.2673948407173157,
1784
+ "learning_rate": 8.799641516831905e-07,
1785
+ "loss": 0.0425,
1786
+ "step": 12350
1787
+ },
1788
+ {
1789
+ "epoch": 2.0847343644922662,
1790
+ "grad_norm": 0.3450436294078827,
1791
+ "learning_rate": 8.790306017662764e-07,
1792
+ "loss": 0.0517,
1793
+ "step": 12400
1794
+ },
1795
+ {
1796
+ "epoch": 2.093140551445864,
1797
+ "grad_norm": 0.45672616362571716,
1798
+ "learning_rate": 8.780970518493623e-07,
1799
+ "loss": 0.0582,
1800
+ "step": 12450
1801
+ },
1802
+ {
1803
+ "epoch": 2.101546738399462,
1804
+ "grad_norm": 0.250683069229126,
1805
+ "learning_rate": 8.771635019324483e-07,
1806
+ "loss": 0.034,
1807
+ "step": 12500
1808
+ },
1809
+ {
1810
+ "epoch": 2.10995292535306,
1811
+ "grad_norm": 0.5667222142219543,
1812
+ "learning_rate": 8.762299520155342e-07,
1813
+ "loss": 0.0377,
1814
+ "step": 12550
1815
+ },
1816
+ {
1817
+ "epoch": 2.1183591123066576,
1818
+ "grad_norm": 0.15519775450229645,
1819
+ "learning_rate": 8.752964020986202e-07,
1820
+ "loss": 0.0532,
1821
+ "step": 12600
1822
+ },
1823
+ {
1824
+ "epoch": 2.1267652992602555,
1825
+ "grad_norm": 0.5342125296592712,
1826
+ "learning_rate": 8.743628521817061e-07,
1827
+ "loss": 0.0338,
1828
+ "step": 12650
1829
+ },
1830
+ {
1831
+ "epoch": 2.1351714862138533,
1832
+ "grad_norm": 0.21783240139484406,
1833
+ "learning_rate": 8.734293022647921e-07,
1834
+ "loss": 0.0524,
1835
+ "step": 12700
1836
+ },
1837
+ {
1838
+ "epoch": 2.143577673167451,
1839
+ "grad_norm": 0.16794736683368683,
1840
+ "learning_rate": 8.72495752347878e-07,
1841
+ "loss": 0.0625,
1842
+ "step": 12750
1843
+ },
1844
+ {
1845
+ "epoch": 2.151983860121049,
1846
+ "grad_norm": 0.715691089630127,
1847
+ "learning_rate": 8.715622024309638e-07,
1848
+ "loss": 0.0434,
1849
+ "step": 12800
1850
+ },
1851
+ {
1852
+ "epoch": 2.160390047074647,
1853
+ "grad_norm": 1.9928004741668701,
1854
+ "learning_rate": 8.706286525140499e-07,
1855
+ "loss": 0.0598,
1856
+ "step": 12850
1857
+ },
1858
+ {
1859
+ "epoch": 2.1687962340282447,
1860
+ "grad_norm": 1.6334397792816162,
1861
+ "learning_rate": 8.696951025971358e-07,
1862
+ "loss": 0.0445,
1863
+ "step": 12900
1864
+ },
1865
+ {
1866
+ "epoch": 2.1772024209818426,
1867
+ "grad_norm": 0.14497284591197968,
1868
+ "learning_rate": 8.687615526802218e-07,
1869
+ "loss": 0.0468,
1870
+ "step": 12950
1871
+ },
1872
+ {
1873
+ "epoch": 2.1856086079354404,
1874
+ "grad_norm": 1.2945809364318848,
1875
+ "learning_rate": 8.678280027633077e-07,
1876
+ "loss": 0.0367,
1877
+ "step": 13000
1878
+ },
1879
+ {
1880
+ "epoch": 2.1940147948890383,
1881
+ "grad_norm": 0.4334909915924072,
1882
+ "learning_rate": 8.668944528463937e-07,
1883
+ "loss": 0.0467,
1884
+ "step": 13050
1885
+ },
1886
+ {
1887
+ "epoch": 2.202420981842636,
1888
+ "grad_norm": 1.6940721273422241,
1889
+ "learning_rate": 8.659609029294796e-07,
1890
+ "loss": 0.0569,
1891
+ "step": 13100
1892
+ },
1893
+ {
1894
+ "epoch": 2.210827168796234,
1895
+ "grad_norm": 0.13102596998214722,
1896
+ "learning_rate": 8.650273530125656e-07,
1897
+ "loss": 0.0453,
1898
+ "step": 13150
1899
+ },
1900
+ {
1901
+ "epoch": 2.219233355749832,
1902
+ "grad_norm": 0.4492725729942322,
1903
+ "learning_rate": 8.640938030956515e-07,
1904
+ "loss": 0.0508,
1905
+ "step": 13200
1906
+ },
1907
+ {
1908
+ "epoch": 2.2276395427034297,
1909
+ "grad_norm": 1.6728957891464233,
1910
+ "learning_rate": 8.631602531787374e-07,
1911
+ "loss": 0.0433,
1912
+ "step": 13250
1913
+ },
1914
+ {
1915
+ "epoch": 2.2360457296570275,
1916
+ "grad_norm": 0.1410200595855713,
1917
+ "learning_rate": 8.622267032618234e-07,
1918
+ "loss": 0.0457,
1919
+ "step": 13300
1920
+ },
1921
+ {
1922
+ "epoch": 2.2444519166106254,
1923
+ "grad_norm": 0.4576292634010315,
1924
+ "learning_rate": 8.612931533449092e-07,
1925
+ "loss": 0.0604,
1926
+ "step": 13350
1927
+ },
1928
+ {
1929
+ "epoch": 2.2528581035642232,
1930
+ "grad_norm": 2.419782876968384,
1931
+ "learning_rate": 8.603596034279953e-07,
1932
+ "loss": 0.0373,
1933
+ "step": 13400
1934
+ },
1935
+ {
1936
+ "epoch": 2.261264290517821,
1937
+ "grad_norm": 0.6388083696365356,
1938
+ "learning_rate": 8.594260535110812e-07,
1939
+ "loss": 0.051,
1940
+ "step": 13450
1941
+ },
1942
+ {
1943
+ "epoch": 2.269670477471419,
1944
+ "grad_norm": 1.2558374404907227,
1945
+ "learning_rate": 8.584925035941672e-07,
1946
+ "loss": 0.0382,
1947
+ "step": 13500
1948
+ },
1949
+ {
1950
+ "epoch": 2.2780766644250168,
1951
+ "grad_norm": 0.6086406707763672,
1952
+ "learning_rate": 8.57558953677253e-07,
1953
+ "loss": 0.0486,
1954
+ "step": 13550
1955
+ },
1956
+ {
1957
+ "epoch": 2.2864828513786146,
1958
+ "grad_norm": 1.564790964126587,
1959
+ "learning_rate": 8.56625403760339e-07,
1960
+ "loss": 0.0343,
1961
+ "step": 13600
1962
+ },
1963
+ {
1964
+ "epoch": 2.2948890383322125,
1965
+ "grad_norm": 0.11007804423570633,
1966
+ "learning_rate": 8.55691853843425e-07,
1967
+ "loss": 0.0488,
1968
+ "step": 13650
1969
+ },
1970
+ {
1971
+ "epoch": 2.3032952252858103,
1972
+ "grad_norm": 0.0976046770811081,
1973
+ "learning_rate": 8.547583039265109e-07,
1974
+ "loss": 0.0364,
1975
+ "step": 13700
1976
+ },
1977
+ {
1978
+ "epoch": 2.311701412239408,
1979
+ "grad_norm": 0.08952467888593674,
1980
+ "learning_rate": 8.538247540095969e-07,
1981
+ "loss": 0.0334,
1982
+ "step": 13750
1983
+ },
1984
+ {
1985
+ "epoch": 2.320107599193006,
1986
+ "grad_norm": 0.25883930921554565,
1987
+ "learning_rate": 8.528912040926828e-07,
1988
+ "loss": 0.0483,
1989
+ "step": 13800
1990
+ },
1991
+ {
1992
+ "epoch": 2.328513786146604,
1993
+ "grad_norm": 0.1525714248418808,
1994
+ "learning_rate": 8.519576541757687e-07,
1995
+ "loss": 0.0432,
1996
+ "step": 13850
1997
+ },
1998
+ {
1999
+ "epoch": 2.3369199731002017,
2000
+ "grad_norm": 0.18361321091651917,
2001
+ "learning_rate": 8.510241042588547e-07,
2002
+ "loss": 0.049,
2003
+ "step": 13900
2004
+ },
2005
+ {
2006
+ "epoch": 2.3453261600537996,
2007
+ "grad_norm": 0.21892070770263672,
2008
+ "learning_rate": 8.500905543419407e-07,
2009
+ "loss": 0.0312,
2010
+ "step": 13950
2011
+ },
2012
+ {
2013
+ "epoch": 2.3537323470073974,
2014
+ "grad_norm": 3.5728206634521484,
2015
+ "learning_rate": 8.491570044250266e-07,
2016
+ "loss": 0.039,
2017
+ "step": 14000
2018
+ },
2019
+ {
2020
+ "epoch": 2.3621385339609953,
2021
+ "grad_norm": 0.6285228729248047,
2022
+ "learning_rate": 8.482234545081125e-07,
2023
+ "loss": 0.0432,
2024
+ "step": 14050
2025
+ },
2026
+ {
2027
+ "epoch": 2.370544720914593,
2028
+ "grad_norm": 0.3574727475643158,
2029
+ "learning_rate": 8.472899045911984e-07,
2030
+ "loss": 0.0464,
2031
+ "step": 14100
2032
+ },
2033
+ {
2034
+ "epoch": 2.378950907868191,
2035
+ "grad_norm": 0.6059629321098328,
2036
+ "learning_rate": 8.463563546742843e-07,
2037
+ "loss": 0.0466,
2038
+ "step": 14150
2039
+ },
2040
+ {
2041
+ "epoch": 2.387357094821789,
2042
+ "grad_norm": 0.6901473999023438,
2043
+ "learning_rate": 8.454228047573704e-07,
2044
+ "loss": 0.052,
2045
+ "step": 14200
2046
+ },
2047
+ {
2048
+ "epoch": 2.3957632817753867,
2049
+ "grad_norm": 0.47388774156570435,
2050
+ "learning_rate": 8.444892548404563e-07,
2051
+ "loss": 0.0329,
2052
+ "step": 14250
2053
+ },
2054
+ {
2055
+ "epoch": 2.4041694687289845,
2056
+ "grad_norm": 0.12275730073451996,
2057
+ "learning_rate": 8.435557049235423e-07,
2058
+ "loss": 0.0395,
2059
+ "step": 14300
2060
+ },
2061
+ {
2062
+ "epoch": 2.4125756556825824,
2063
+ "grad_norm": 0.14599856734275818,
2064
+ "learning_rate": 8.426221550066281e-07,
2065
+ "loss": 0.0402,
2066
+ "step": 14350
2067
+ },
2068
+ {
2069
+ "epoch": 2.42098184263618,
2070
+ "grad_norm": 2.370673179626465,
2071
+ "learning_rate": 8.416886050897141e-07,
2072
+ "loss": 0.0596,
2073
+ "step": 14400
2074
+ },
2075
+ {
2076
+ "epoch": 2.429388029589778,
2077
+ "grad_norm": 1.1168391704559326,
2078
+ "learning_rate": 8.407550551728001e-07,
2079
+ "loss": 0.0439,
2080
+ "step": 14450
2081
+ },
2082
+ {
2083
+ "epoch": 2.437794216543376,
2084
+ "grad_norm": 1.3855054378509521,
2085
+ "learning_rate": 8.39821505255886e-07,
2086
+ "loss": 0.0477,
2087
+ "step": 14500
2088
+ },
2089
+ {
2090
+ "epoch": 2.4462004034969738,
2091
+ "grad_norm": 0.08173301815986633,
2092
+ "learning_rate": 8.38887955338972e-07,
2093
+ "loss": 0.0467,
2094
+ "step": 14550
2095
+ },
2096
+ {
2097
+ "epoch": 2.4546065904505716,
2098
+ "grad_norm": 1.462754487991333,
2099
+ "learning_rate": 8.379544054220578e-07,
2100
+ "loss": 0.0466,
2101
+ "step": 14600
2102
+ },
2103
+ {
2104
+ "epoch": 2.4630127774041695,
2105
+ "grad_norm": 0.187517911195755,
2106
+ "learning_rate": 8.370208555051438e-07,
2107
+ "loss": 0.0539,
2108
+ "step": 14650
2109
+ },
2110
+ {
2111
+ "epoch": 2.4714189643577673,
2112
+ "grad_norm": 0.17454290390014648,
2113
+ "learning_rate": 8.360873055882298e-07,
2114
+ "loss": 0.0375,
2115
+ "step": 14700
2116
+ },
2117
+ {
2118
+ "epoch": 2.479825151311365,
2119
+ "grad_norm": 2.0886340141296387,
2120
+ "learning_rate": 8.351537556713158e-07,
2121
+ "loss": 0.0486,
2122
+ "step": 14750
2123
+ },
2124
+ {
2125
+ "epoch": 2.488231338264963,
2126
+ "grad_norm": 1.7701690196990967,
2127
+ "learning_rate": 8.342202057544017e-07,
2128
+ "loss": 0.0395,
2129
+ "step": 14800
2130
+ },
2131
+ {
2132
+ "epoch": 2.488231338264963,
2133
+ "eval_accuracy": 0.9904761904761905,
2134
+ "eval_f1_macro": 0.9903547100746427,
2135
+ "eval_loss": 0.027853745967149734,
2136
+ "eval_precision": 0.9909244803912348,
2137
+ "eval_recall": 0.9898341320283959,
2138
+ "eval_runtime": 51.6999,
2139
+ "eval_samples_per_second": 75.145,
2140
+ "eval_steps_per_second": 9.4,
2141
+ "step": 14800
2142
+ },
2143
+ {
2144
+ "epoch": 2.496637525218561,
2145
+ "grad_norm": 0.22243212163448334,
2146
+ "learning_rate": 8.332866558374876e-07,
2147
+ "loss": 0.046,
2148
+ "step": 14850
2149
+ },
2150
+ {
2151
+ "epoch": 2.5050437121721587,
2152
+ "grad_norm": 0.09879665821790695,
2153
+ "learning_rate": 8.323531059205735e-07,
2154
+ "loss": 0.0334,
2155
+ "step": 14900
2156
+ },
2157
+ {
2158
+ "epoch": 2.5134498991257566,
2159
+ "grad_norm": 0.11353790014982224,
2160
+ "learning_rate": 8.314195560036594e-07,
2161
+ "loss": 0.0349,
2162
+ "step": 14950
2163
+ },
2164
+ {
2165
+ "epoch": 2.5218560860793544,
2166
+ "grad_norm": 0.0835421159863472,
2167
+ "learning_rate": 8.304860060867455e-07,
2168
+ "loss": 0.0373,
2169
+ "step": 15000
2170
+ },
2171
+ {
2172
+ "epoch": 2.5302622730329523,
2173
+ "grad_norm": 1.0338134765625,
2174
+ "learning_rate": 8.295524561698314e-07,
2175
+ "loss": 0.0508,
2176
+ "step": 15050
2177
+ },
2178
+ {
2179
+ "epoch": 2.53866845998655,
2180
+ "grad_norm": 0.06706652790307999,
2181
+ "learning_rate": 8.286189062529173e-07,
2182
+ "loss": 0.0413,
2183
+ "step": 15100
2184
+ },
2185
+ {
2186
+ "epoch": 2.547074646940148,
2187
+ "grad_norm": 1.6149780750274658,
2188
+ "learning_rate": 8.276853563360032e-07,
2189
+ "loss": 0.0496,
2190
+ "step": 15150
2191
+ },
2192
+ {
2193
+ "epoch": 2.555480833893746,
2194
+ "grad_norm": 1.550213098526001,
2195
+ "learning_rate": 8.267518064190892e-07,
2196
+ "loss": 0.0402,
2197
+ "step": 15200
2198
+ },
2199
+ {
2200
+ "epoch": 2.5638870208473437,
2201
+ "grad_norm": 0.07133010029792786,
2202
+ "learning_rate": 8.258182565021752e-07,
2203
+ "loss": 0.0445,
2204
+ "step": 15250
2205
+ },
2206
+ {
2207
+ "epoch": 2.5722932078009415,
2208
+ "grad_norm": 0.6747908592224121,
2209
+ "learning_rate": 8.24884706585261e-07,
2210
+ "loss": 0.0294,
2211
+ "step": 15300
2212
+ },
2213
+ {
2214
+ "epoch": 2.5806993947545394,
2215
+ "grad_norm": 0.6118685007095337,
2216
+ "learning_rate": 8.23951156668347e-07,
2217
+ "loss": 0.0483,
2218
+ "step": 15350
2219
+ },
2220
+ {
2221
+ "epoch": 2.589105581708137,
2222
+ "grad_norm": 0.09292344748973846,
2223
+ "learning_rate": 8.230176067514329e-07,
2224
+ "loss": 0.0438,
2225
+ "step": 15400
2226
+ },
2227
+ {
2228
+ "epoch": 2.597511768661735,
2229
+ "grad_norm": 0.11092889308929443,
2230
+ "learning_rate": 8.220840568345189e-07,
2231
+ "loss": 0.0258,
2232
+ "step": 15450
2233
+ },
2234
+ {
2235
+ "epoch": 2.605917955615333,
2236
+ "grad_norm": 0.08764071017503738,
2237
+ "learning_rate": 8.211505069176049e-07,
2238
+ "loss": 0.0413,
2239
+ "step": 15500
2240
+ },
2241
+ {
2242
+ "epoch": 2.6143241425689308,
2243
+ "grad_norm": 2.178694248199463,
2244
+ "learning_rate": 8.202169570006909e-07,
2245
+ "loss": 0.0451,
2246
+ "step": 15550
2247
+ },
2248
+ {
2249
+ "epoch": 2.6227303295225286,
2250
+ "grad_norm": 0.3444342315196991,
2251
+ "learning_rate": 8.192834070837767e-07,
2252
+ "loss": 0.0367,
2253
+ "step": 15600
2254
+ },
2255
+ {
2256
+ "epoch": 2.6311365164761265,
2257
+ "grad_norm": 0.07718443125486374,
2258
+ "learning_rate": 8.183498571668627e-07,
2259
+ "loss": 0.0448,
2260
+ "step": 15650
2261
+ },
2262
+ {
2263
+ "epoch": 2.6395427034297243,
2264
+ "grad_norm": 0.07826303690671921,
2265
+ "learning_rate": 8.174163072499486e-07,
2266
+ "loss": 0.032,
2267
+ "step": 15700
2268
+ },
2269
+ {
2270
+ "epoch": 2.647948890383322,
2271
+ "grad_norm": 0.1438082605600357,
2272
+ "learning_rate": 8.164827573330345e-07,
2273
+ "loss": 0.0327,
2274
+ "step": 15750
2275
+ },
2276
+ {
2277
+ "epoch": 2.65635507733692,
2278
+ "grad_norm": 0.10826120525598526,
2279
+ "learning_rate": 8.155492074161206e-07,
2280
+ "loss": 0.0353,
2281
+ "step": 15800
2282
+ },
2283
+ {
2284
+ "epoch": 2.664761264290518,
2285
+ "grad_norm": 0.1938948631286621,
2286
+ "learning_rate": 8.146156574992064e-07,
2287
+ "loss": 0.043,
2288
+ "step": 15850
2289
+ },
2290
+ {
2291
+ "epoch": 2.6731674512441157,
2292
+ "grad_norm": 0.17752999067306519,
2293
+ "learning_rate": 8.136821075822924e-07,
2294
+ "loss": 0.0353,
2295
+ "step": 15900
2296
+ },
2297
+ {
2298
+ "epoch": 2.6815736381977135,
2299
+ "grad_norm": 0.35069605708122253,
2300
+ "learning_rate": 8.127485576653783e-07,
2301
+ "loss": 0.0319,
2302
+ "step": 15950
2303
+ },
2304
+ {
2305
+ "epoch": 2.6899798251513114,
2306
+ "grad_norm": 2.336815357208252,
2307
+ "learning_rate": 8.118150077484643e-07,
2308
+ "loss": 0.0469,
2309
+ "step": 16000
2310
+ },
2311
+ {
2312
+ "epoch": 2.6983860121049092,
2313
+ "grad_norm": 0.4466639757156372,
2314
+ "learning_rate": 8.108814578315503e-07,
2315
+ "loss": 0.0181,
2316
+ "step": 16050
2317
+ },
2318
+ {
2319
+ "epoch": 2.706792199058507,
2320
+ "grad_norm": 0.8735927939414978,
2321
+ "learning_rate": 8.099479079146362e-07,
2322
+ "loss": 0.0483,
2323
+ "step": 16100
2324
+ },
2325
+ {
2326
+ "epoch": 2.715198386012105,
2327
+ "grad_norm": 0.12205011397600174,
2328
+ "learning_rate": 8.090143579977221e-07,
2329
+ "loss": 0.0369,
2330
+ "step": 16150
2331
+ },
2332
+ {
2333
+ "epoch": 2.723604572965703,
2334
+ "grad_norm": 0.057348959147930145,
2335
+ "learning_rate": 8.08080808080808e-07,
2336
+ "loss": 0.032,
2337
+ "step": 16200
2338
+ },
2339
+ {
2340
+ "epoch": 2.7320107599193006,
2341
+ "grad_norm": 0.13915815949440002,
2342
+ "learning_rate": 8.07147258163894e-07,
2343
+ "loss": 0.0548,
2344
+ "step": 16250
2345
+ },
2346
+ {
2347
+ "epoch": 2.7404169468728985,
2348
+ "grad_norm": 0.1121092364192009,
2349
+ "learning_rate": 8.0621370824698e-07,
2350
+ "loss": 0.0478,
2351
+ "step": 16300
2352
+ },
2353
+ {
2354
+ "epoch": 2.7488231338264963,
2355
+ "grad_norm": 0.6932018995285034,
2356
+ "learning_rate": 8.052801583300659e-07,
2357
+ "loss": 0.0235,
2358
+ "step": 16350
2359
+ },
2360
+ {
2361
+ "epoch": 2.757229320780094,
2362
+ "grad_norm": 0.2540469765663147,
2363
+ "learning_rate": 8.043466084131518e-07,
2364
+ "loss": 0.0343,
2365
+ "step": 16400
2366
+ },
2367
+ {
2368
+ "epoch": 2.765635507733692,
2369
+ "grad_norm": 0.09147974848747253,
2370
+ "learning_rate": 8.034130584962378e-07,
2371
+ "loss": 0.0294,
2372
+ "step": 16450
2373
+ },
2374
+ {
2375
+ "epoch": 2.77404169468729,
2376
+ "grad_norm": 0.055779941380023956,
2377
+ "learning_rate": 8.024795085793237e-07,
2378
+ "loss": 0.0306,
2379
+ "step": 16500
2380
+ },
2381
+ {
2382
+ "epoch": 2.7824478816408877,
2383
+ "grad_norm": 0.17213129997253418,
2384
+ "learning_rate": 8.015459586624095e-07,
2385
+ "loss": 0.0403,
2386
+ "step": 16550
2387
+ },
2388
+ {
2389
+ "epoch": 2.7908540685944856,
2390
+ "grad_norm": 0.25217875838279724,
2391
+ "learning_rate": 8.006124087454956e-07,
2392
+ "loss": 0.0427,
2393
+ "step": 16600
2394
+ },
2395
+ {
2396
+ "epoch": 2.7992602555480834,
2397
+ "grad_norm": 0.2783985137939453,
2398
+ "learning_rate": 7.996788588285815e-07,
2399
+ "loss": 0.0348,
2400
+ "step": 16650
2401
+ },
2402
+ {
2403
+ "epoch": 2.8076664425016813,
2404
+ "grad_norm": 2.88033390045166,
2405
+ "learning_rate": 7.987453089116675e-07,
2406
+ "loss": 0.0361,
2407
+ "step": 16700
2408
+ },
2409
+ {
2410
+ "epoch": 2.816072629455279,
2411
+ "grad_norm": 0.35371074080467224,
2412
+ "learning_rate": 7.978117589947534e-07,
2413
+ "loss": 0.0426,
2414
+ "step": 16750
2415
+ },
2416
+ {
2417
+ "epoch": 2.824478816408877,
2418
+ "grad_norm": 0.18328945338726044,
2419
+ "learning_rate": 7.968782090778394e-07,
2420
+ "loss": 0.0187,
2421
+ "step": 16800
2422
+ },
2423
+ {
2424
+ "epoch": 2.832885003362475,
2425
+ "grad_norm": 0.04678039625287056,
2426
+ "learning_rate": 7.959446591609253e-07,
2427
+ "loss": 0.027,
2428
+ "step": 16850
2429
+ },
2430
+ {
2431
+ "epoch": 2.8412911903160727,
2432
+ "grad_norm": 0.12014785408973694,
2433
+ "learning_rate": 7.950111092440113e-07,
2434
+ "loss": 0.0312,
2435
+ "step": 16900
2436
+ },
2437
+ {
2438
+ "epoch": 2.8496973772696705,
2439
+ "grad_norm": 0.2731820046901703,
2440
+ "learning_rate": 7.940775593270972e-07,
2441
+ "loss": 0.0215,
2442
+ "step": 16950
2443
+ },
2444
+ {
2445
+ "epoch": 2.8581035642232684,
2446
+ "grad_norm": 0.047622449696063995,
2447
+ "learning_rate": 7.931440094101831e-07,
2448
+ "loss": 0.0367,
2449
+ "step": 17000
2450
+ },
2451
+ {
2452
+ "epoch": 2.8665097511768662,
2453
+ "grad_norm": 0.49464407563209534,
2454
+ "learning_rate": 7.92210459493269e-07,
2455
+ "loss": 0.0299,
2456
+ "step": 17050
2457
+ },
2458
+ {
2459
+ "epoch": 2.874915938130464,
2460
+ "grad_norm": 1.0074212551116943,
2461
+ "learning_rate": 7.91276909576355e-07,
2462
+ "loss": 0.0309,
2463
+ "step": 17100
2464
+ },
2465
+ {
2466
+ "epoch": 2.883322125084062,
2467
+ "grad_norm": 3.322622776031494,
2468
+ "learning_rate": 7.90343359659441e-07,
2469
+ "loss": 0.0387,
2470
+ "step": 17150
2471
+ },
2472
+ {
2473
+ "epoch": 2.89172831203766,
2474
+ "grad_norm": 0.40377694368362427,
2475
+ "learning_rate": 7.894098097425269e-07,
2476
+ "loss": 0.0278,
2477
+ "step": 17200
2478
+ },
2479
+ {
2480
+ "epoch": 2.9001344989912576,
2481
+ "grad_norm": 0.057058703154325485,
2482
+ "learning_rate": 7.884762598256129e-07,
2483
+ "loss": 0.0322,
2484
+ "step": 17250
2485
+ },
2486
+ {
2487
+ "epoch": 2.9085406859448555,
2488
+ "grad_norm": 0.05330061540007591,
2489
+ "learning_rate": 7.875427099086987e-07,
2490
+ "loss": 0.0306,
2491
+ "step": 17300
2492
+ },
2493
+ {
2494
+ "epoch": 2.9169468728984533,
2495
+ "grad_norm": 0.21193920075893402,
2496
+ "learning_rate": 7.866091599917846e-07,
2497
+ "loss": 0.0433,
2498
+ "step": 17350
2499
+ },
2500
+ {
2501
+ "epoch": 2.925353059852051,
2502
+ "grad_norm": 0.16045591235160828,
2503
+ "learning_rate": 7.856756100748707e-07,
2504
+ "loss": 0.0284,
2505
+ "step": 17400
2506
+ },
2507
+ {
2508
+ "epoch": 2.933759246805649,
2509
+ "grad_norm": 0.6420087814331055,
2510
+ "learning_rate": 7.847420601579566e-07,
2511
+ "loss": 0.0209,
2512
+ "step": 17450
2513
+ },
2514
+ {
2515
+ "epoch": 2.942165433759247,
2516
+ "grad_norm": 0.0893048495054245,
2517
+ "learning_rate": 7.838085102410426e-07,
2518
+ "loss": 0.03,
2519
+ "step": 17500
2520
+ },
2521
+ {
2522
+ "epoch": 2.9505716207128447,
2523
+ "grad_norm": 0.07634163647890091,
2524
+ "learning_rate": 7.828749603241284e-07,
2525
+ "loss": 0.0392,
2526
+ "step": 17550
2527
+ },
2528
+ {
2529
+ "epoch": 2.9589778076664426,
2530
+ "grad_norm": 1.3213460445404053,
2531
+ "learning_rate": 7.819414104072144e-07,
2532
+ "loss": 0.0381,
2533
+ "step": 17600
2534
+ },
2535
+ {
2536
+ "epoch": 2.9673839946200404,
2537
+ "grad_norm": 1.400268793106079,
2538
+ "learning_rate": 7.810078604903004e-07,
2539
+ "loss": 0.021,
2540
+ "step": 17650
2541
+ },
2542
+ {
2543
+ "epoch": 2.9757901815736383,
2544
+ "grad_norm": 0.06816738098859787,
2545
+ "learning_rate": 7.800743105733864e-07,
2546
+ "loss": 0.0307,
2547
+ "step": 17700
2548
+ },
2549
+ {
2550
+ "epoch": 2.984196368527236,
2551
+ "grad_norm": 1.563502311706543,
2552
+ "learning_rate": 7.791407606564723e-07,
2553
+ "loss": 0.0225,
2554
+ "step": 17750
2555
+ },
2556
+ {
2557
+ "epoch": 2.9858776059179557,
2558
+ "eval_accuracy": 0.9904761904761905,
2559
+ "eval_f1_macro": 0.9903524417560845,
2560
+ "eval_loss": 0.02756033092737198,
2561
+ "eval_precision": 0.9910546886131568,
2562
+ "eval_recall": 0.9897223308061431,
2563
+ "eval_runtime": 42.5014,
2564
+ "eval_samples_per_second": 91.409,
2565
+ "eval_steps_per_second": 11.435,
2566
+ "step": 17760
2567
+ },
2568
+ {
2569
+ "epoch": 2.992602555480834,
2570
+ "grad_norm": 0.03866572305560112,
2571
+ "learning_rate": 7.782072107395581e-07,
2572
+ "loss": 0.0315,
2573
+ "step": 17800
2574
+ },
2575
+ {
2576
+ "epoch": 3.001008742434432,
2577
+ "grad_norm": 0.14597171545028687,
2578
+ "learning_rate": 7.772736608226441e-07,
2579
+ "loss": 0.0337,
2580
+ "step": 17850
2581
+ },
2582
+ {
2583
+ "epoch": 3.0094149293880297,
2584
+ "grad_norm": 0.047286901623010635,
2585
+ "learning_rate": 7.7634011090573e-07,
2586
+ "loss": 0.0345,
2587
+ "step": 17900
2588
+ },
2589
+ {
2590
+ "epoch": 3.0178211163416275,
2591
+ "grad_norm": 0.0630001500248909,
2592
+ "learning_rate": 7.754065609888161e-07,
2593
+ "loss": 0.0369,
2594
+ "step": 17950
2595
+ },
2596
+ {
2597
+ "epoch": 3.0262273032952254,
2598
+ "grad_norm": 0.1186026781797409,
2599
+ "learning_rate": 7.74473011071902e-07,
2600
+ "loss": 0.033,
2601
+ "step": 18000
2602
+ },
2603
+ {
2604
+ "epoch": 3.0346334902488232,
2605
+ "grad_norm": 0.048053912818431854,
2606
+ "learning_rate": 7.73539461154988e-07,
2607
+ "loss": 0.025,
2608
+ "step": 18050
2609
+ },
2610
+ {
2611
+ "epoch": 3.043039677202421,
2612
+ "grad_norm": 0.08836133778095245,
2613
+ "learning_rate": 7.726059112380738e-07,
2614
+ "loss": 0.0333,
2615
+ "step": 18100
2616
+ },
2617
+ {
2618
+ "epoch": 3.051445864156019,
2619
+ "grad_norm": 0.12423136085271835,
2620
+ "learning_rate": 7.716723613211598e-07,
2621
+ "loss": 0.036,
2622
+ "step": 18150
2623
+ },
2624
+ {
2625
+ "epoch": 3.0598520511096168,
2626
+ "grad_norm": 1.3503236770629883,
2627
+ "learning_rate": 7.707388114042458e-07,
2628
+ "loss": 0.032,
2629
+ "step": 18200
2630
+ },
2631
+ {
2632
+ "epoch": 3.0682582380632146,
2633
+ "grad_norm": 1.4914406538009644,
2634
+ "learning_rate": 7.698052614873317e-07,
2635
+ "loss": 0.0194,
2636
+ "step": 18250
2637
+ },
2638
+ {
2639
+ "epoch": 3.0766644250168125,
2640
+ "grad_norm": 0.1584930419921875,
2641
+ "learning_rate": 7.688717115704177e-07,
2642
+ "loss": 0.0352,
2643
+ "step": 18300
2644
+ },
2645
+ {
2646
+ "epoch": 3.0850706119704103,
2647
+ "grad_norm": 4.718121528625488,
2648
+ "learning_rate": 7.679381616535035e-07,
2649
+ "loss": 0.0534,
2650
+ "step": 18350
2651
+ },
2652
+ {
2653
+ "epoch": 3.093476798924008,
2654
+ "grad_norm": 0.11320281028747559,
2655
+ "learning_rate": 7.670046117365895e-07,
2656
+ "loss": 0.026,
2657
+ "step": 18400
2658
+ },
2659
+ {
2660
+ "epoch": 3.101882985877606,
2661
+ "grad_norm": 4.923747539520264,
2662
+ "learning_rate": 7.660710618196755e-07,
2663
+ "loss": 0.0385,
2664
+ "step": 18450
2665
+ },
2666
+ {
2667
+ "epoch": 3.110289172831204,
2668
+ "grad_norm": 0.5125885009765625,
2669
+ "learning_rate": 7.651375119027615e-07,
2670
+ "loss": 0.0333,
2671
+ "step": 18500
2672
+ },
2673
+ {
2674
+ "epoch": 3.1186953597848017,
2675
+ "grad_norm": 1.254514455795288,
2676
+ "learning_rate": 7.642039619858473e-07,
2677
+ "loss": 0.0384,
2678
+ "step": 18550
2679
+ },
2680
+ {
2681
+ "epoch": 3.1271015467383996,
2682
+ "grad_norm": 0.09596805274486542,
2683
+ "learning_rate": 7.632704120689332e-07,
2684
+ "loss": 0.0363,
2685
+ "step": 18600
2686
+ },
2687
+ {
2688
+ "epoch": 3.1355077336919974,
2689
+ "grad_norm": 2.396228313446045,
2690
+ "learning_rate": 7.623368621520192e-07,
2691
+ "loss": 0.0274,
2692
+ "step": 18650
2693
+ },
2694
+ {
2695
+ "epoch": 3.1439139206455953,
2696
+ "grad_norm": 0.04524122551083565,
2697
+ "learning_rate": 7.614033122351051e-07,
2698
+ "loss": 0.0374,
2699
+ "step": 18700
2700
+ },
2701
+ {
2702
+ "epoch": 3.152320107599193,
2703
+ "grad_norm": 0.03596815839409828,
2704
+ "learning_rate": 7.604697623181912e-07,
2705
+ "loss": 0.0208,
2706
+ "step": 18750
2707
+ },
2708
+ {
2709
+ "epoch": 3.160726294552791,
2710
+ "grad_norm": 0.04798245429992676,
2711
+ "learning_rate": 7.59536212401277e-07,
2712
+ "loss": 0.0426,
2713
+ "step": 18800
2714
+ },
2715
+ {
2716
+ "epoch": 3.169132481506389,
2717
+ "grad_norm": 0.06811300665140152,
2718
+ "learning_rate": 7.58602662484363e-07,
2719
+ "loss": 0.0176,
2720
+ "step": 18850
2721
+ },
2722
+ {
2723
+ "epoch": 3.1775386684599867,
2724
+ "grad_norm": 0.2457869052886963,
2725
+ "learning_rate": 7.576691125674489e-07,
2726
+ "loss": 0.0254,
2727
+ "step": 18900
2728
+ },
2729
+ {
2730
+ "epoch": 3.1859448554135845,
2731
+ "grad_norm": 0.7892419695854187,
2732
+ "learning_rate": 7.567355626505349e-07,
2733
+ "loss": 0.0394,
2734
+ "step": 18950
2735
+ },
2736
+ {
2737
+ "epoch": 3.1943510423671824,
2738
+ "grad_norm": 2.816732883453369,
2739
+ "learning_rate": 7.558020127336209e-07,
2740
+ "loss": 0.0394,
2741
+ "step": 19000
2742
+ },
2743
+ {
2744
+ "epoch": 3.20275722932078,
2745
+ "grad_norm": 0.4110700786113739,
2746
+ "learning_rate": 7.548684628167067e-07,
2747
+ "loss": 0.0264,
2748
+ "step": 19050
2749
+ },
2750
+ {
2751
+ "epoch": 3.211163416274378,
2752
+ "grad_norm": 1.199163794517517,
2753
+ "learning_rate": 7.539349128997927e-07,
2754
+ "loss": 0.0415,
2755
+ "step": 19100
2756
+ },
2757
+ {
2758
+ "epoch": 3.219569603227976,
2759
+ "grad_norm": 0.17944256961345673,
2760
+ "learning_rate": 7.530013629828786e-07,
2761
+ "loss": 0.0236,
2762
+ "step": 19150
2763
+ },
2764
+ {
2765
+ "epoch": 3.2279757901815738,
2766
+ "grad_norm": 2.5870165824890137,
2767
+ "learning_rate": 7.520678130659646e-07,
2768
+ "loss": 0.0253,
2769
+ "step": 19200
2770
+ },
2771
+ {
2772
+ "epoch": 3.2363819771351716,
2773
+ "grad_norm": 1.1989127397537231,
2774
+ "learning_rate": 7.511342631490506e-07,
2775
+ "loss": 0.0339,
2776
+ "step": 19250
2777
+ },
2778
+ {
2779
+ "epoch": 3.2447881640887695,
2780
+ "grad_norm": 2.1989920139312744,
2781
+ "learning_rate": 7.502007132321366e-07,
2782
+ "loss": 0.032,
2783
+ "step": 19300
2784
+ },
2785
+ {
2786
+ "epoch": 3.2531943510423673,
2787
+ "grad_norm": 3.3766326904296875,
2788
+ "learning_rate": 7.492671633152224e-07,
2789
+ "loss": 0.0338,
2790
+ "step": 19350
2791
+ },
2792
+ {
2793
+ "epoch": 3.261600537995965,
2794
+ "grad_norm": 2.154630184173584,
2795
+ "learning_rate": 7.483336133983084e-07,
2796
+ "loss": 0.0433,
2797
+ "step": 19400
2798
+ },
2799
+ {
2800
+ "epoch": 3.270006724949563,
2801
+ "grad_norm": 0.17257541418075562,
2802
+ "learning_rate": 7.474000634813943e-07,
2803
+ "loss": 0.0241,
2804
+ "step": 19450
2805
+ },
2806
+ {
2807
+ "epoch": 3.278412911903161,
2808
+ "grad_norm": 0.35731831192970276,
2809
+ "learning_rate": 7.464665135644802e-07,
2810
+ "loss": 0.0314,
2811
+ "step": 19500
2812
+ },
2813
+ {
2814
+ "epoch": 3.2868190988567587,
2815
+ "grad_norm": 0.7590059041976929,
2816
+ "learning_rate": 7.455329636475663e-07,
2817
+ "loss": 0.0325,
2818
+ "step": 19550
2819
+ },
2820
+ {
2821
+ "epoch": 3.2952252858103566,
2822
+ "grad_norm": 0.07474666833877563,
2823
+ "learning_rate": 7.445994137306521e-07,
2824
+ "loss": 0.0267,
2825
+ "step": 19600
2826
+ },
2827
+ {
2828
+ "epoch": 3.3036314727639544,
2829
+ "grad_norm": 0.6764310002326965,
2830
+ "learning_rate": 7.436658638137381e-07,
2831
+ "loss": 0.0409,
2832
+ "step": 19650
2833
+ },
2834
+ {
2835
+ "epoch": 3.3120376597175523,
2836
+ "grad_norm": 0.17073293030261993,
2837
+ "learning_rate": 7.42732313896824e-07,
2838
+ "loss": 0.0352,
2839
+ "step": 19700
2840
+ },
2841
+ {
2842
+ "epoch": 3.32044384667115,
2843
+ "grad_norm": 0.4134461581707001,
2844
+ "learning_rate": 7.4179876397991e-07,
2845
+ "loss": 0.045,
2846
+ "step": 19750
2847
+ },
2848
+ {
2849
+ "epoch": 3.328850033624748,
2850
+ "grad_norm": 0.05611399933695793,
2851
+ "learning_rate": 7.40865214062996e-07,
2852
+ "loss": 0.0395,
2853
+ "step": 19800
2854
+ },
2855
+ {
2856
+ "epoch": 3.337256220578346,
2857
+ "grad_norm": 0.23517532646656036,
2858
+ "learning_rate": 7.399316641460819e-07,
2859
+ "loss": 0.032,
2860
+ "step": 19850
2861
+ },
2862
+ {
2863
+ "epoch": 3.3456624075319437,
2864
+ "grad_norm": 1.1617560386657715,
2865
+ "learning_rate": 7.389981142291678e-07,
2866
+ "loss": 0.0255,
2867
+ "step": 19900
2868
+ },
2869
+ {
2870
+ "epoch": 3.3540685944855415,
2871
+ "grad_norm": 1.7898571491241455,
2872
+ "learning_rate": 7.380645643122537e-07,
2873
+ "loss": 0.0396,
2874
+ "step": 19950
2875
+ },
2876
+ {
2877
+ "epoch": 3.3624747814391394,
2878
+ "grad_norm": 0.30372563004493713,
2879
+ "learning_rate": 7.371310143953397e-07,
2880
+ "loss": 0.0273,
2881
+ "step": 20000
2882
+ },
2883
+ {
2884
+ "epoch": 3.370880968392737,
2885
+ "grad_norm": 0.03882971778512001,
2886
+ "learning_rate": 7.361974644784256e-07,
2887
+ "loss": 0.0311,
2888
+ "step": 20050
2889
+ },
2890
+ {
2891
+ "epoch": 3.379287155346335,
2892
+ "grad_norm": 0.027714738622307777,
2893
+ "learning_rate": 7.352639145615116e-07,
2894
+ "loss": 0.0233,
2895
+ "step": 20100
2896
+ },
2897
+ {
2898
+ "epoch": 3.387693342299933,
2899
+ "grad_norm": 0.06837920099496841,
2900
+ "learning_rate": 7.343303646445975e-07,
2901
+ "loss": 0.0258,
2902
+ "step": 20150
2903
+ },
2904
+ {
2905
+ "epoch": 3.3960995292535308,
2906
+ "grad_norm": 0.06077512726187706,
2907
+ "learning_rate": 7.333968147276835e-07,
2908
+ "loss": 0.0365,
2909
+ "step": 20200
2910
+ },
2911
+ {
2912
+ "epoch": 3.4045057162071286,
2913
+ "grad_norm": 3.514616012573242,
2914
+ "learning_rate": 7.324632648107694e-07,
2915
+ "loss": 0.0353,
2916
+ "step": 20250
2917
+ },
2918
+ {
2919
+ "epoch": 3.4129119031607265,
2920
+ "grad_norm": 0.13428473472595215,
2921
+ "learning_rate": 7.315297148938552e-07,
2922
+ "loss": 0.0283,
2923
+ "step": 20300
2924
+ },
2925
+ {
2926
+ "epoch": 3.4213180901143243,
2927
+ "grad_norm": 0.05296952277421951,
2928
+ "learning_rate": 7.305961649769413e-07,
2929
+ "loss": 0.0328,
2930
+ "step": 20350
2931
+ },
2932
+ {
2933
+ "epoch": 3.429724277067922,
2934
+ "grad_norm": 0.42664337158203125,
2935
+ "learning_rate": 7.296626150600272e-07,
2936
+ "loss": 0.0279,
2937
+ "step": 20400
2938
+ },
2939
+ {
2940
+ "epoch": 3.43813046402152,
2941
+ "grad_norm": 0.5672262907028198,
2942
+ "learning_rate": 7.287290651431132e-07,
2943
+ "loss": 0.0333,
2944
+ "step": 20450
2945
+ },
2946
+ {
2947
+ "epoch": 3.446536650975118,
2948
+ "grad_norm": 0.08966855704784393,
2949
+ "learning_rate": 7.277955152261991e-07,
2950
+ "loss": 0.0185,
2951
+ "step": 20500
2952
+ },
2953
+ {
2954
+ "epoch": 3.4549428379287157,
2955
+ "grad_norm": 0.08157237619161606,
2956
+ "learning_rate": 7.26861965309285e-07,
2957
+ "loss": 0.0249,
2958
+ "step": 20550
2959
+ },
2960
+ {
2961
+ "epoch": 3.4633490248823136,
2962
+ "grad_norm": 0.05281166359782219,
2963
+ "learning_rate": 7.25928415392371e-07,
2964
+ "loss": 0.0354,
2965
+ "step": 20600
2966
+ },
2967
+ {
2968
+ "epoch": 3.4717552118359114,
2969
+ "grad_norm": 0.03640764206647873,
2970
+ "learning_rate": 7.24994865475457e-07,
2971
+ "loss": 0.0238,
2972
+ "step": 20650
2973
+ },
2974
+ {
2975
+ "epoch": 3.4801613987895093,
2976
+ "grad_norm": 0.10095292329788208,
2977
+ "learning_rate": 7.240613155585429e-07,
2978
+ "loss": 0.0291,
2979
+ "step": 20700
2980
+ },
2981
+ {
2982
+ "epoch": 3.4835238735709484,
2983
+ "eval_accuracy": 0.9922779922779923,
2984
+ "eval_f1_macro": 0.9921853502169998,
2985
+ "eval_loss": 0.0231217909604311,
2986
+ "eval_precision": 0.9923553160253439,
2987
+ "eval_recall": 0.9920202883023748,
2988
+ "eval_runtime": 41.9983,
2989
+ "eval_samples_per_second": 92.504,
2990
+ "eval_steps_per_second": 11.572,
2991
+ "step": 20720
2992
+ }
2993
+ ],
2994
+ "logging_steps": 50,
2995
+ "max_steps": 59480,
2996
+ "num_input_tokens_seen": 0,
2997
+ "num_train_epochs": 10,
2998
+ "save_steps": 2960,
2999
+ "stateful_callbacks": {
3000
+ "TrainerControl": {
3001
+ "args": {
3002
+ "should_epoch_stop": false,
3003
+ "should_evaluate": false,
3004
+ "should_log": false,
3005
+ "should_save": true,
3006
+ "should_training_stop": false
3007
+ },
3008
+ "attributes": {}
3009
+ }
3010
+ },
3011
+ "total_flos": 3.433256024715952e+18,
3012
+ "train_batch_size": 32,
3013
+ "trial_name": null,
3014
+ "trial_params": null
3015
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e98d861bd236d02ed4822863a0e9472a59a0de4f23ce64912d62809ce6ccc8b1
3
+ size 5240