Adding the Open Portuguese LLM Leaderboard Evaluation Results

#3
Files changed (1) hide show
  1. README.md +170 -13
README.md CHANGED
@@ -17,8 +17,7 @@ model-index:
17
  value: 76.49
18
  name: strict accuracy
19
  source:
20
- url: >-
21
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
22
  name: Open LLM Leaderboard
23
  - task:
24
  type: text-generation
@@ -33,8 +32,7 @@ model-index:
33
  value: 42.25
34
  name: normalized accuracy
35
  source:
36
- url: >-
37
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
38
  name: Open LLM Leaderboard
39
  - task:
40
  type: text-generation
@@ -49,8 +47,7 @@ model-index:
49
  value: 1.74
50
  name: exact match
51
  source:
52
- url: >-
53
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
54
  name: Open LLM Leaderboard
55
  - task:
56
  type: text-generation
@@ -65,8 +62,7 @@ model-index:
65
  value: 10.74
66
  name: acc_norm
67
  source:
68
- url: >-
69
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
70
  name: Open LLM Leaderboard
71
  - task:
72
  type: text-generation
@@ -81,8 +77,7 @@ model-index:
81
  value: 12.39
82
  name: acc_norm
83
  source:
84
- url: >-
85
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
86
  name: Open LLM Leaderboard
87
  - task:
88
  type: text-generation
@@ -99,9 +94,152 @@ model-index:
99
  value: 35.63
100
  name: accuracy
101
  source:
102
- url: >-
103
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
104
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ---
106
 
107
  # Gemma-2-Ataraxy-Gemmasutra-9B-slerp
@@ -142,4 +280,23 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
142
  |MATH Lvl 5 (4-Shot)| 1.74|
143
  |GPQA (0-shot) |10.74|
144
  |MuSR (0-shot) |12.39|
145
- |MMLU-PRO (5-shot) |35.63|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  value: 76.49
18
  name: strict accuracy
19
  source:
20
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
21
  name: Open LLM Leaderboard
22
  - task:
23
  type: text-generation
 
32
  value: 42.25
33
  name: normalized accuracy
34
  source:
35
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
36
  name: Open LLM Leaderboard
37
  - task:
38
  type: text-generation
 
47
  value: 1.74
48
  name: exact match
49
  source:
50
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
51
  name: Open LLM Leaderboard
52
  - task:
53
  type: text-generation
 
62
  value: 10.74
63
  name: acc_norm
64
  source:
65
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
66
  name: Open LLM Leaderboard
67
  - task:
68
  type: text-generation
 
77
  value: 12.39
78
  name: acc_norm
79
  source:
80
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
81
  name: Open LLM Leaderboard
82
  - task:
83
  type: text-generation
 
94
  value: 35.63
95
  name: accuracy
96
  source:
97
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
98
  name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: ENEM Challenge (No Images)
104
+ type: eduagarcia/enem_challenge
105
+ split: train
106
+ args:
107
+ num_few_shot: 3
108
+ metrics:
109
+ - type: acc
110
+ value: 75.65
111
+ name: accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: BLUEX (No Images)
120
+ type: eduagarcia-temp/BLUEX_without_images
121
+ split: train
122
+ args:
123
+ num_few_shot: 3
124
+ metrics:
125
+ - type: acc
126
+ value: 64.26
127
+ name: accuracy
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: OAB Exams
136
+ type: eduagarcia/oab_exams
137
+ split: train
138
+ args:
139
+ num_few_shot: 3
140
+ metrics:
141
+ - type: acc
142
+ value: 53.76
143
+ name: accuracy
144
+ source:
145
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
146
+ name: Open Portuguese LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: Assin2 RTE
152
+ type: assin2
153
+ split: test
154
+ args:
155
+ num_few_shot: 15
156
+ metrics:
157
+ - type: f1_macro
158
+ value: 93.21
159
+ name: f1-macro
160
+ source:
161
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
162
+ name: Open Portuguese LLM Leaderboard
163
+ - task:
164
+ type: text-generation
165
+ name: Text Generation
166
+ dataset:
167
+ name: Assin2 STS
168
+ type: eduagarcia/portuguese_benchmark
169
+ split: test
170
+ args:
171
+ num_few_shot: 15
172
+ metrics:
173
+ - type: pearson
174
+ value: 80.91
175
+ name: pearson
176
+ source:
177
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
178
+ name: Open Portuguese LLM Leaderboard
179
+ - task:
180
+ type: text-generation
181
+ name: Text Generation
182
+ dataset:
183
+ name: FaQuAD NLI
184
+ type: ruanchaves/faquad-nli
185
+ split: test
186
+ args:
187
+ num_few_shot: 15
188
+ metrics:
189
+ - type: f1_macro
190
+ value: 77.39
191
+ name: f1-macro
192
+ source:
193
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
194
+ name: Open Portuguese LLM Leaderboard
195
+ - task:
196
+ type: text-generation
197
+ name: Text Generation
198
+ dataset:
199
+ name: HateBR Binary
200
+ type: ruanchaves/hatebr
201
+ split: test
202
+ args:
203
+ num_few_shot: 25
204
+ metrics:
205
+ - type: f1_macro
206
+ value: 87.61
207
+ name: f1-macro
208
+ source:
209
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
210
+ name: Open Portuguese LLM Leaderboard
211
+ - task:
212
+ type: text-generation
213
+ name: Text Generation
214
+ dataset:
215
+ name: PT Hate Speech Binary
216
+ type: hate_speech_portuguese
217
+ split: test
218
+ args:
219
+ num_few_shot: 25
220
+ metrics:
221
+ - type: f1_macro
222
+ value: 66.84
223
+ name: f1-macro
224
+ source:
225
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
226
+ name: Open Portuguese LLM Leaderboard
227
+ - task:
228
+ type: text-generation
229
+ name: Text Generation
230
+ dataset:
231
+ name: tweetSentBR
232
+ type: eduagarcia/tweetsentbr_fewshot
233
+ split: test
234
+ args:
235
+ num_few_shot: 25
236
+ metrics:
237
+ - type: f1_macro
238
+ value: 66.14
239
+ name: f1-macro
240
+ source:
241
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
242
+ name: Open Portuguese LLM Leaderboard
243
  ---
244
 
245
  # Gemma-2-Ataraxy-Gemmasutra-9B-slerp
 
280
  |MATH Lvl 5 (4-Shot)| 1.74|
281
  |GPQA (0-shot) |10.74|
282
  |MuSR (0-shot) |12.39|
283
+ |MMLU-PRO (5-shot) |35.63|
284
+
285
+
286
+ # Open Portuguese LLM Leaderboard Evaluation Results
287
+
288
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
289
+
290
+ | Metric | Value |
291
+ |--------------------------|---------|
292
+ |Average |**73.97**|
293
+ |ENEM Challenge (No Images)| 75.65|
294
+ |BLUEX (No Images) | 64.26|
295
+ |OAB Exams | 53.76|
296
+ |Assin2 RTE | 93.21|
297
+ |Assin2 STS | 80.91|
298
+ |FaQuAD NLI | 77.39|
299
+ |HateBR Binary | 87.61|
300
+ |PT Hate Speech Binary | 66.84|
301
+ |tweetSentBR | 66.14|
302
+