Files changed (1) hide show
  1. README.md +115 -7
README.md CHANGED
@@ -1,20 +1,114 @@
1
  ---
2
- license: mit
3
- license_link: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct/resolve/main/LICENSE
4
-
5
  language:
6
  - multilingual
7
- pipeline_tag: text-generation
8
  tags:
9
  - nlp
10
  - code
 
 
11
  inference:
12
  parameters:
13
  temperature: 0.7
14
  widget:
15
- - messages:
16
- - role: user
17
- content: Can you provide ways to eat combinations of bananas and dragonfruits?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
  # Phi-3-medium-4k-instruct-abliterated-v3
@@ -77,3 +171,17 @@ This model may come with interesting quirks, with the methodology being so new.
77
  If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.
78
 
79
  Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - multilingual
4
+ license: mit
5
  tags:
6
  - nlp
7
  - code
8
+ license_link: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct/resolve/main/LICENSE
9
+ pipeline_tag: text-generation
10
  inference:
11
  parameters:
12
  temperature: 0.7
13
  widget:
14
+ - messages:
15
+ - role: user
16
+ content: Can you provide ways to eat combinations of bananas and dragonfruits?
17
+ model-index:
18
+ - name: Phi-3-medium-4k-instruct-abliterated-v3
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: IFEval (0-Shot)
25
+ type: HuggingFaceH4/ifeval
26
+ args:
27
+ num_few_shot: 0
28
+ metrics:
29
+ - type: inst_level_strict_acc and prompt_level_strict_acc
30
+ value: 63.19
31
+ name: strict accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Phi-3-medium-4k-instruct-abliterated-v3
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: BBH (3-Shot)
40
+ type: BBH
41
+ args:
42
+ num_few_shot: 3
43
+ metrics:
44
+ - type: acc_norm
45
+ value: 46.73
46
+ name: normalized accuracy
47
+ source:
48
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Phi-3-medium-4k-instruct-abliterated-v3
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: MATH Lvl 5 (4-Shot)
55
+ type: hendrycks/competition_math
56
+ args:
57
+ num_few_shot: 4
58
+ metrics:
59
+ - type: exact_match
60
+ value: 14.12
61
+ name: exact match
62
+ source:
63
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Phi-3-medium-4k-instruct-abliterated-v3
64
+ name: Open LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: GPQA (0-shot)
70
+ type: Idavidrein/gpqa
71
+ args:
72
+ num_few_shot: 0
73
+ metrics:
74
+ - type: acc_norm
75
+ value: 8.95
76
+ name: acc_norm
77
+ source:
78
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Phi-3-medium-4k-instruct-abliterated-v3
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: MuSR (0-shot)
85
+ type: TAUR-Lab/MuSR
86
+ args:
87
+ num_few_shot: 0
88
+ metrics:
89
+ - type: acc_norm
90
+ value: 18.52
91
+ name: acc_norm
92
+ source:
93
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Phi-3-medium-4k-instruct-abliterated-v3
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: MMLU-PRO (5-shot)
100
+ type: TIGER-Lab/MMLU-Pro
101
+ config: main
102
+ split: test
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 37.78
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Phi-3-medium-4k-instruct-abliterated-v3
111
+ name: Open LLM Leaderboard
112
  ---
113
 
114
  # Phi-3-medium-4k-instruct-abliterated-v3
 
171
  If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.
172
 
173
  Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.
174
+
175
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
176
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_failspy__Phi-3-medium-4k-instruct-abliterated-v3)
177
+
178
+ | Metric |Value|
179
+ |-------------------|----:|
180
+ |Avg. |31.55|
181
+ |IFEval (0-Shot) |63.19|
182
+ |BBH (3-Shot) |46.73|
183
+ |MATH Lvl 5 (4-Shot)|14.12|
184
+ |GPQA (0-shot) | 8.95|
185
+ |MuSR (0-shot) |18.52|
186
+ |MMLU-PRO (5-shot) |37.78|
187
+