---
license: apache-2.0
base_model: upstage/SOLAR-10.7B-v1.0
tags:
- generated_from_trainer
model-index:
- name: yanolja/KoSOLAR-10.7B-v0.1
  results: []
---

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
## Discord

If you're passionate about the field of Large Language Models and wish to exchange knowledge and insights, we warmly invite you to join our Discord server. It's worth noting that Korean is the primary language used in this server. The landscape of LLM is evolving rapidly, and without active sharing, our collective knowledge risks becoming outdated swiftly. Let's collaborate and drive greater impact together! Join us here: https://discord.gg/b27bAHg95m.

# yanolja/KoSOLAR-10.7B-v0.1

This model is a Korean vocabulary-extended version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0), trained on various Korean web-crawled datasets that are publicly available on HuggingFace.
The hypothesis was that while maintaining the original performance of the base model, we could add more tokens to the base model's vocabulary by training the embeddings for the new tokens only. The evaluation results seem to indicate that both English and Korean performances were preserved.

## Model Description

Most parameters of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) were frozen except for the embed_tokens layer and the lm_head layer. Embeddings for the existing tokens in those layers were frozen during training. The embeddings for the new tokens have been tuned.

## Intended Uses & Limitations

No instruction tuning has been performed. You should train this model for your specific purposes with caution.

## Training and Evaluation Data

Various Korean web-crawled datasets that are open on HuggingFace.

## Training Procedure

### Training Hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 1

### Training Results

#### upstage/SOLAR-10.7B-v1.0

| Groups      | Version | Filter    | n-shot | Metric      | Value  |     | Stderr |
|-------------|---------|-----------|--------|-------------|--------|-----|--------|
| kmmlu       | N/A     | none      | 0      | acc         | 0.3004 | ±   | 0.0528 |
| gsm8k       | Yaml    | get-answer| 5      | exact_match | 0.5625 | ±   | 0.0137 |
| hellaswag   | Yaml    | none      | 0      | acc         | 0.6393 | ±   | 0.0048 |
| mmlu        | N/A     | none      | 0      | acc         | 0.6305 | ±   | 0.1452 |
| truthfulqa  | N/A     | none      | 0      | acc         | 0.4096 | ±   | 0.0467 |
| winogrande  | Yaml    | none      | 0      | acc         | 0.7443 | ±   | 0.0123 |

#### yanolja/KoSOLAR-10.7B-v0.1

| Groups      | Version | Filter    | n-shot | Metric      | Value  |     | Stderr |
|-------------|---------|-----------|--------|-------------|--------|-----|--------|
| kmmlu       | N/A     | none      | 0      | acc         | 0.2948 | ±   | 0.0537 |
| gsm8k       | Yaml    | get-answer| 5      | exact_match | 0.5527 | ±   | 0.0137 |
| hellaswag   | Yaml    | none      | 0      | acc         | 0.6392 | ±   | 0.0048 |
| mmlu        | N/A     | none      | 0      | acc         | 0.6303 | ±   | 0.1411 |
| truthfulqa  | N/A     | none      | 0      | acc         | 0.3618 | ±   | 0.0472 |
| winogrande  | Yaml    | none      | 0      | acc         | 0.7459 | ±   | 0.0122 |

### Framework Versions

- Transformers 4.37.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0