Skyro-4X8B / README.md
saucam's picture
Update README.md
696b1b5 verified
metadata
tags:
  - merge
  - mergekit
  - moe
  - frankenmoe
  - abacusai/Llama-3-Smaug-8B
  - cognitivecomputations/dolphin-2.9-llama3-8b
  - Weyaxi/Einstein-v6.1-Llama3-8B
  - dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
base_model:
  - abacusai/Llama-3-Smaug-8B
  - cognitivecomputations/dolphin-2.9-llama3-8b
  - Weyaxi/Einstein-v6.1-Llama3-8B
  - dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
license: apache-2.0

πŸš€ Skyro-4X8B

Skyro-4X8B is a Mixure of Experts (MoE) made with the following models using Mergekit:

🧩 Configuration

base_model: meta-llama/Meta-Llama-3-8B
gate_mode: hidden
experts:
  - source_model: abacusai/Llama-3-Smaug-8B
    positive_prompts:
    - "chat"
    - "assistant"
    - "tell me"
    - "explain"
    - "I want"
  - source_model: cognitivecomputations/dolphin-2.9-llama3-8b
    positive_prompts:
    - "math"
    - "mathematics"
    - "code"
    - "engineering"
    - "solve"
    - "logic"
    - "rationality"
    - "puzzle"
    - "solve"
  - source_model: Weyaxi/Einstein-v6.1-Llama3-8B
    positive_prompts:
    - "science"
    - "medical"
    - "physics"
    - "engineering"
    - "math"
    - "logic"
    - "rationality"
    - "mathematics"
    - "solve"
  - source_model: dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
    positive_prompts:
    - "story"
    - "roleplay"
    - "role-play"
    - "storywriting"
    - "character"
    - "narrative"
    - "creative"

Evaluation

Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
66.39 61.26 82.38 66.67 50.15 77.66 60.2

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "saucam/Skyro-4X8B"
messages = [{"role": "user", "content": "In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Sample output

config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 878/878 [00:00<00:00, 4.18MB/s]
model.safetensors.index.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 53.5k/53.5k [00:00<00:00, 101MB/s]
model-00001-of-00006.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.89G/9.89G [03:47<00:00, 43.4MB/s]
model-00002-of-00006.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.98G/9.98G [03:23<00:00, 49.0MB/s]
model-00003-of-00006.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.98G/9.98G [03:44<00:00, 44.5MB/s]
model-00004-of-00006.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.90G/9.90G [03:30<00:00, 46.9MB/s]
model-00005-of-00006.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.08G/9.08G [03:08<00:00, 48.1MB/s]
model-00006-of-00006.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.05G/1.05G [00:20<00:00, 51.3MB/s]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [17:58<00:00, 179.78s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [01:27<00:00, 14.59s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

<|im_start|>user
In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?<|im_end|>
<|im_start|>assistant
Let's denote the number of votes candidate A got as \( A \).

Candidate B got 50% more votes than candidate A, so candidate B got \( A + 0.5A = 1.5A \) votes.

Candidate C got the rest of the votes, which means \( C = 100 - (A + 1.5A) \).

We know that candidate A got 20% of the votes, so \( A = 20\% \times 100 = 20 \).

Now we can calculate candidate C's votes:
\( C = 100 - (20 + 1.5 \times 20) \)
\( C = 100 - (20 + 30) \)
\( C = 100 - 50 \)
\( C = 50 \).

Therefore, candidate C got 50 votes.<|im_end|>