--- license: apache-2.0 datasets: - adamo1139/rawrr_v1 tags: - dpo - qlora - unsloth --- Another QLoRA DPO training of Yi-34B-200K. This time with sequence length 500, lora_r 16 and lora alpha 32. I was able to squeeze that in using Unsloth, script I used is in this repo. It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it. Will try to train this on AEZAKMI v2 now. Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team) [made with Unsloth](https://github.com/unslothai/unsloth)