--- license: apache-2.0 base_model: upstage/SOLAR-10.7B-v1.0 tags: - generated_from_trainer model-index: - name: yanolja/KoSOLAR-10.7B-v0.1 results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) ## Discord If you're passionate about the field of Large Language Models and wish to exchange knowledge and insights, we warmly invite you to join our Discord server. It's worth noting that Korean is the primary language used in this server. The landscape of LLM is evolving rapidly, and without active sharing, our collective knowledge risks becoming outdated swiftly. Let's collaborate and drive greater impact together! Join us here: https://discord.gg/b27bAHg95m. # yanolja/KoSOLAR-10.7B-v0.1 This model is a Korean vocabulary-extended version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0), trained on various Korean web-crawled datasets that are publicly available on HuggingFace. The hypothesis was that while maintaining the original performance of the base model, we could add more tokens to the base model's vocabulary by training the embeddings for the new tokens only. The evaluation results seem to indicate that both English and Korean performances were preserved. ## Model Description Most parameters of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) were frozen except for the embed_tokens layer and the lm_head layer. Embeddings for the existing tokens in those layers were frozen during training. The embeddings for the new tokens have been tuned. ## Intended Uses & Limitations No instruction tuning has been performed. You should train this model for your specific purposes with caution. ## Training and Evaluation Data Various Korean web-crawled datasets that are open on HuggingFace. ## Training Procedure ### Training Hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 256 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 1 ### Training Results #### upstage/SOLAR-10.7B-v1.0 | Groups | Version | Filter | n-shot | Metric | Value | | Stderr | |-------------|---------|-----------|--------|-------------|--------|-----|--------| | kmmlu | N/A | none | 0 | acc | 0.3004 | ± | 0.0528 | | gsm8k | Yaml | get-answer| 5 | exact_match | 0.5625 | ± | 0.0137 | | hellaswag | Yaml | none | 0 | acc | 0.6393 | ± | 0.0048 | | mmlu | N/A | none | 0 | acc | 0.6305 | ± | 0.1452 | | truthfulqa | N/A | none | 0 | acc | 0.4096 | ± | 0.0467 | | winogrande | Yaml | none | 0 | acc | 0.7443 | ± | 0.0123 | #### yanolja/KoSOLAR-10.7B-v0.1 | Groups | Version | Filter | n-shot | Metric | Value | | Stderr | |-------------|---------|-----------|--------|-------------|--------|-----|--------| | kmmlu | N/A | none | 0 | acc | 0.2948 | ± | 0.0537 | | gsm8k | Yaml | get-answer| 5 | exact_match | 0.5527 | ± | 0.0137 | | hellaswag | Yaml | none | 0 | acc | 0.6392 | ± | 0.0048 | | mmlu | N/A | none | 0 | acc | 0.6303 | ± | 0.1411 | | truthfulqa | N/A | none | 0 | acc | 0.3618 | ± | 0.0472 | | winogrande | Yaml | none | 0 | acc | 0.7459 | ± | 0.0122 | ### Framework Versions - Transformers 4.37.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.16.0 - Tokenizers 0.15.0