mlabonne commited on
Commit
91775a8
โ€ข
1 Parent(s): b9a4580

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -28
README.md CHANGED
@@ -17,14 +17,16 @@ base_model:
17
 
18
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/9XVgxKyuXTQVO5mO-EOd4.jpeg)
19
 
20
- # ๐Ÿ”ฎ Beyonder-4x7B-v3
21
 
22
- Beyonder-4x7B-v3 is an improvement over the popular [Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2). It's a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
23
  * [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
24
  * [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
25
  * [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B)
26
  * [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)
27
 
 
 
28
  ## ๐Ÿ” Applications
29
 
30
  This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
@@ -35,10 +37,16 @@ Thanks to its four experts, it's a well-rounded model, capable of achieving most
35
 
36
  ## โšก Quantized models
37
 
 
 
38
  * **GGUF**: https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF
 
 
39
 
40
  ## ๐Ÿ† Evaluation
41
 
 
 
42
  ### Nous
43
 
44
  Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)) and significantly outperforms the v2. See the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
@@ -48,11 +56,21 @@ Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation
48
  | [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) [๐Ÿ“„](https://gist.github.com/mlabonne/1d33c86824b3a11d2308e36db1ba41c1) | 62.74 | 45.37 | 77.01 | 78.39 | 50.2 |
49
  | [**mlabonne/Beyonder-4x7B-v3**](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) [๐Ÿ“„](https://gist.github.com/mlabonne/3740020807e559f7057c32e85ce42d92) | **61.91** | **45.85** | **76.67** | **74.98** | **50.12** |
50
  | [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) [๐Ÿ“„](https://gist.github.com/mlabonne/cbeb077d1df71cb81c78f742f19f4155) | 59.39 | 45.23 | 76.2 | 67.61 | 48.52 |
 
51
  | [mlabonne/Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2) [๐Ÿ“„](https://gist.github.com/mlabonne/f73baa140a510a676242f8a4496d05ca) | 57.13 | 45.29 | 75.95 | 60.86 | 46.4 |
 
 
 
 
 
 
 
52
 
53
  ### Open LLM Leaderboard
54
 
55
- Running...
 
 
56
 
57
  ## ๐Ÿงฉ Configuration
58
 
@@ -89,29 +107,6 @@ experts:
89
  - "count"
90
  ```
91
 
92
- ## ๐Ÿ’ป Usage
93
-
94
- ```python
95
- !pip install -qU transformers bitsandbytes accelerate
96
-
97
- from transformers import AutoTokenizer
98
- import transformers
99
- import torch
100
-
101
- model = "mlabonne/Beyonder-4x7B-v3"
102
-
103
- tokenizer = AutoTokenizer.from_pretrained(model)
104
- pipeline = transformers.pipeline(
105
- "text-generation",
106
- model=model,
107
- model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
108
- )
109
-
110
- messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
111
- prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
112
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
113
- print(outputs[0]["generated_text"])
114
- ```
115
- Output:
116
 
117
- > A Mixture of Experts (MoE) is a neural network architecture that tackles complex tasks by dividing them into simpler subtasks, delegating each to specialized expert modules. These experts learn to independently handle specific problem aspects. The MoE structure combines their outputs, leveraging their expertise for improved overall performance. This approach promotes modularity, adaptability, and scalability, allowing for better generalization in various applications.
 
17
 
18
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/9XVgxKyuXTQVO5mO-EOd4.jpeg)
19
 
20
+ # ๐Ÿ”ฎ Beyonder-4x7B-v3 GGUF
21
 
22
+ [Beyonder-4x7B-v3](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) is an improvement over the popular [Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2). It's a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
23
  * [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
24
  * [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
25
  * [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B)
26
  * [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)
27
 
28
+ Special thanks to [beowolx](https://huggingface.co/beowolx) for making the best Mistral-based code model and to [SanjiWatsuki](https://huggingface.co/SanjiWatsuki) for creating one of the very best RP models.
29
+
30
  ## ๐Ÿ” Applications
31
 
32
  This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
 
37
 
38
  ## โšก Quantized models
39
 
40
+ Thanks [bartowski](https://huggingface.co/bartowski) for quantizing this model.
41
+
42
  * **GGUF**: https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF
43
+ * **More GGUF**: https://huggingface.co/bartowski/Beyonder-4x7B-v3-GGUF
44
+ * **ExLlamaV2**: https://huggingface.co/bartowski/Beyonder-4x7B-v3-exl2
45
 
46
  ## ๐Ÿ† Evaluation
47
 
48
+ This model is not designed to excel in traditional benchmarks, as the code and role-playing models generally do not apply to those contexts. Nonetheless, it performs remarkably well thanks to strong general-purpose experts.
49
+
50
  ### Nous
51
 
52
  Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)) and significantly outperforms the v2. See the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
 
56
  | [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) [๐Ÿ“„](https://gist.github.com/mlabonne/1d33c86824b3a11d2308e36db1ba41c1) | 62.74 | 45.37 | 77.01 | 78.39 | 50.2 |
57
  | [**mlabonne/Beyonder-4x7B-v3**](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) [๐Ÿ“„](https://gist.github.com/mlabonne/3740020807e559f7057c32e85ce42d92) | **61.91** | **45.85** | **76.67** | **74.98** | **50.12** |
58
  | [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) [๐Ÿ“„](https://gist.github.com/mlabonne/cbeb077d1df71cb81c78f742f19f4155) | 59.39 | 45.23 | 76.2 | 67.61 | 48.52 |
59
+ | [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B) [๐Ÿ“„](https://gist.github.com/mlabonne/895ff5171e998abfdf2a41a4f9c84450) | 58.29 | 44.79 | 75.05 | 65.68 | 47.65 |
60
  | [mlabonne/Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2) [๐Ÿ“„](https://gist.github.com/mlabonne/f73baa140a510a676242f8a4496d05ca) | 57.13 | 45.29 | 75.95 | 60.86 | 46.4 |
61
+ | [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B) [๐Ÿ“„](https://gist.github.com/mlabonne/08b5280c221fbd7f98eb27561ae902a3) | 50.35 | 39.98 | 71.77 | 48.73 | 40.92 |
62
+
63
+ ### EQ-Bench
64
+
65
+ Beyonder-4x7B-v3 is the best 4x7B model on the EQ-Bench leaderboard, outperforming older versions of ChatGPT and Llama-2-70b-chat. It is very close to Mixtral-8x7B-Instruct-v0.1 and Gemini Pro. Thanks [Sam Paech](https://huggingface.co/sam-paech) for running the eval.
66
+
67
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/-OSHe2ImrxN8wAREnSZAZ.png)
68
 
69
  ### Open LLM Leaderboard
70
 
71
+ It's also a strong performer on the Open LLM Leaderboard, significantly outperforming the v2 model.
72
+
73
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/NFRYqzwuy9TB-s-Hy3gRy.png)
74
 
75
  ## ๐Ÿงฉ Configuration
76
 
 
107
  - "count"
108
  ```
109
 
110
+ ## ๐ŸŒณ Model Family Tree
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zQi5VgmdqJv6pFaGoQ2AL.png)