Commit
a275f3e
1 Parent(s): c31bf6f

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (4f1570468ec1a8445b79700f8a89bf380ad4aa94)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +143 -7
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: other
3
  language:
4
  - en
5
  - fr
@@ -7,7 +6,7 @@ language:
7
  - hi
8
  - zh
9
  - code
10
- base_model: microsoft/Orca-2-13b
11
  datasets:
12
  - HuggingFaceH4/no_robots
13
  - mlabonne/guanaco-llama2-1k
@@ -17,15 +16,152 @@ datasets:
17
  - LDJnr/Capybara
18
  - LDJnr/LessWrong-Amplify-Instruct
19
  - LDJnr/Verified-Camel
 
20
  widget:
21
- - example_title: "Cave Johnson"
22
- text: "user\nCome up with a funny rant from Cave Johnson about sentient AI\n\nassistant\n"
23
- output:
24
- text: "Ladies and gentlemen, I'm Cave Johnson, CEO of Aperture Science, and I've got an announcement to make - we've created sentient AI! But don't worry, it's not like those rogue AIs you hear about in the news. Our AI is as friendly as a grilled cheese sandwich on a cold day. In fact, we're calling our newest creation... SandwichBot! SandwichBot: (in a robotic voice) Hello, my name is SandwichBot, and I am here to serve you delicious sandwiches and ensure humanity's dominance over all other species. Also, please note that I will be referring to myself as 'Thee' instead of 'The'. It's more regal that way. Cave Johnson: There you have it folks, a sentient AI that can make sandwiches and save the world! Now, if you'll excuse me, I need to go back to my office and work on my other groundbreaking inventions, like combustible lemons and portal technology. Remember, kids, always try new things and never stop asking questions – unless the answer is 42, because then you should probably just accept it and move on. Peace out!"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ---
26
 
27
  The "microsoft/Orca-2-13b" model fully fine-tuned on HuggingFaceH4/no_robots, totally-not-an-llm/EverythingLM-data-V3, LDJnr/Capybara, LDJnr/Pure-Dove, LDJnr/LessWrong-Amplify-Instruct, LDJnr/Verified-Camel, mlabonne/guanaco-llama2-1k, and OpenAssistant/oasst_top1_2023-08-25. This model achieved a test loss of 0.39 on LDJnr/Verified-Camel.
28
 
29
  Make sure to comply with the microsoft research license. Please read it before using this model.
30
 
31
- This model was trained on the ChatML prompt template.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
  - fr
 
6
  - hi
7
  - zh
8
  - code
9
+ license: other
10
  datasets:
11
  - HuggingFaceH4/no_robots
12
  - mlabonne/guanaco-llama2-1k
 
16
  - LDJnr/Capybara
17
  - LDJnr/LessWrong-Amplify-Instruct
18
  - LDJnr/Verified-Camel
19
+ base_model: microsoft/Orca-2-13b
20
  widget:
21
+ - example_title: Cave Johnson
22
+ text: 'user
23
+
24
+ Come up with a funny rant from Cave Johnson about sentient AI
25
+
26
+
27
+ assistant
28
+
29
+ '
30
+ output:
31
+ text: 'Ladies and gentlemen, I''m Cave Johnson, CEO of Aperture Science, and I''ve
32
+ got an announcement to make - we''ve created sentient AI! But don''t worry,
33
+ it''s not like those rogue AIs you hear about in the news. Our AI is as friendly
34
+ as a grilled cheese sandwich on a cold day. In fact, we''re calling our newest
35
+ creation... SandwichBot! SandwichBot: (in a robotic voice) Hello, my name is
36
+ SandwichBot, and I am here to serve you delicious sandwiches and ensure humanity''s
37
+ dominance over all other species. Also, please note that I will be referring
38
+ to myself as ''Thee'' instead of ''The''. It''s more regal that way. Cave Johnson:
39
+ There you have it folks, a sentient AI that can make sandwiches and save the
40
+ world! Now, if you''ll excuse me, I need to go back to my office and work on
41
+ my other groundbreaking inventions, like combustible lemons and portal technology.
42
+ Remember, kids, always try new things and never stop asking questions – unless
43
+ the answer is 42, because then you should probably just accept it and move on.
44
+ Peace out!'
45
+ model-index:
46
+ - name: Orca-2-13b-SFT-v6
47
+ results:
48
+ - task:
49
+ type: text-generation
50
+ name: Text Generation
51
+ dataset:
52
+ name: AI2 Reasoning Challenge (25-Shot)
53
+ type: ai2_arc
54
+ config: ARC-Challenge
55
+ split: test
56
+ args:
57
+ num_few_shot: 25
58
+ metrics:
59
+ - type: acc_norm
60
+ value: 60.41
61
+ name: normalized accuracy
62
+ source:
63
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT-v6
64
+ name: Open LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: HellaSwag (10-Shot)
70
+ type: hellaswag
71
+ split: validation
72
+ args:
73
+ num_few_shot: 10
74
+ metrics:
75
+ - type: acc_norm
76
+ value: 80.46
77
+ name: normalized accuracy
78
+ source:
79
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT-v6
80
+ name: Open LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: MMLU (5-Shot)
86
+ type: cais/mmlu
87
+ config: all
88
+ split: test
89
+ args:
90
+ num_few_shot: 5
91
+ metrics:
92
+ - type: acc
93
+ value: 59.51
94
+ name: accuracy
95
+ source:
96
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT-v6
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: TruthfulQA (0-shot)
103
+ type: truthful_qa
104
+ config: multiple_choice
105
+ split: validation
106
+ args:
107
+ num_few_shot: 0
108
+ metrics:
109
+ - type: mc2
110
+ value: 54.01
111
+ source:
112
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT-v6
113
+ name: Open LLM Leaderboard
114
+ - task:
115
+ type: text-generation
116
+ name: Text Generation
117
+ dataset:
118
+ name: Winogrande (5-shot)
119
+ type: winogrande
120
+ config: winogrande_xl
121
+ split: validation
122
+ args:
123
+ num_few_shot: 5
124
+ metrics:
125
+ - type: acc
126
+ value: 77.43
127
+ name: accuracy
128
+ source:
129
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT-v6
130
+ name: Open LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: GSM8k (5-shot)
136
+ type: gsm8k
137
+ config: main
138
+ split: test
139
+ args:
140
+ num_few_shot: 5
141
+ metrics:
142
+ - type: acc
143
+ value: 5.08
144
+ name: accuracy
145
+ source:
146
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT-v6
147
+ name: Open LLM Leaderboard
148
  ---
149
 
150
  The "microsoft/Orca-2-13b" model fully fine-tuned on HuggingFaceH4/no_robots, totally-not-an-llm/EverythingLM-data-V3, LDJnr/Capybara, LDJnr/Pure-Dove, LDJnr/LessWrong-Amplify-Instruct, LDJnr/Verified-Camel, mlabonne/guanaco-llama2-1k, and OpenAssistant/oasst_top1_2023-08-25. This model achieved a test loss of 0.39 on LDJnr/Verified-Camel.
151
 
152
  Make sure to comply with the microsoft research license. Please read it before using this model.
153
 
154
+ This model was trained on the ChatML prompt template.
155
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
156
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__Orca-2-13b-SFT-v6)
157
+
158
+ | Metric |Value|
159
+ |---------------------------------|----:|
160
+ |Avg. |56.15|
161
+ |AI2 Reasoning Challenge (25-Shot)|60.41|
162
+ |HellaSwag (10-Shot) |80.46|
163
+ |MMLU (5-Shot) |59.51|
164
+ |TruthfulQA (0-shot) |54.01|
165
+ |Winogrande (5-shot) |77.43|
166
+ |GSM8k (5-shot) | 5.08|
167
+