brucethemoose commited on
Commit
ceeb49b
1 Parent(s): b900cf0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md CHANGED
@@ -2,4 +2,104 @@
2
  license: other
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
 
 
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - text-generation-inference
11
  ---
12
+
13
+ **Dolphin-2.2-yi-34b-200k**, **Nous-Capybara-34B**, **Tess-M-v1.3**, **Airoboros-3_1-yi-34b-200k**, **PlatYi-34B-Q**, and **una-xaberius-34b-v1beta** merged with a new, experimental implementation of "dare ties" via mergekit. See:
14
+
15
+ > [Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch](https://github.com/yule-BUAA/MergeLM)
16
+
17
+ > https://github.com/cg123/mergekit/tree/dare
18
+
19
+
20
+ Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
21
+ ```
22
+ models:
23
+ - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
24
+ # no parameters necessary for base model
25
+ - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
26
+ parameters:
27
+ weight: 0.19
28
+ density: 0.44
29
+ - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
30
+ parameters:
31
+ weight: 0.14
32
+ density: 0.34
33
+ - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
34
+ parameters:
35
+ weight: 0.19
36
+ density: 0.44
37
+ - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200K-Q
38
+ parameters:
39
+ weight: 0.14
40
+ density: 0.34
41
+ - model: /home/alpha/FastModels/ehartford_dolphin-2.2-yi-34b-200k
42
+ parameters:
43
+ weight: 0.19
44
+ density: 0.44
45
+ - model: /home/alpha/FastModels/fblgit_una-xaberius-34b-v1beta
46
+ parameters:
47
+ weight: 0.15
48
+ density: 0.08
49
+ merge_method: dare_ties
50
+ base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
51
+ parameters:
52
+
53
+ int8_mask: true
54
+ dtype: bfloat16
55
+
56
+ ```
57
+ Various densities were tested with perplexity and high context test prompts. Relatively high densities seem to perform better, contrary to the findings of the Super Mario paper.
58
+
59
+ A total density of 1 seems to be optimal.
60
+
61
+ Dare Ties is also resulting in better merges than regular Ties merge (which was already excellent)
62
+
63
+ Xaberuis is not a 200K model, hence it was merged at a very low density to try and preserve Yi 200K's long context performance while still inheriting some of Xaberius's training.
64
+
65
+ I chose not to include other finetunes because they aren't trained on the 200K base. If any other 200K finetunes pop up, let me know.
66
+
67
+ ***
68
+
69
+ ## Prompt template: Orca-Vicuna?
70
+
71
+ ```
72
+ SYSTEM: {system_message}
73
+ USER: {prompt}
74
+ ASSISTANT:
75
+ ```
76
+
77
+ It might recognize ChatML from Dolphin+Xaberius, and Llama-chat from Airoboros.
78
+
79
+ Being a Yi model, try disabling the BOS token and/or running a lower temperature with 0.05-0.13 MinP, a little repitition penalty, and no other samplers. Yi tends to run "hot" by default.
80
+
81
+ Sometimes the model "spells out" the stop token as `</s>` like Capybara, so you may need to add `</s>` as an additional stopping condition.
82
+
83
+ ***
84
+ 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task.
85
+ ***
86
+
87
+ Credits:
88
+
89
+ https://github.com/cg123/mergekit/tree/dare
90
+
91
+ https://huggingface.co/ehartford/dolphin-2.2-yi-34b-200k
92
+
93
+ https://huggingface.co/kyujinpy/PlatYi-34B-Q
94
+
95
+ https://huggingface.co/NousResearch/Nous-Capybara-34B/
96
+
97
+ https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
98
+
99
+ https://huggingface.co/migtissera/Tess-M-v1.3
100
+
101
+ https://huggingface.co/fblgit/una-xaberius-34b-v1beta
102
+
103
+ https://huggingface.co/chargoddard/Yi-34B-200K-Llama
104
+
105
+ https://huggingface.co/01-ai/Yi-34B-200K