adamo1139
/

Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

adamo1139 commited on Jan 22

Commit

8248694

•

1 Parent(s): 91a3383

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -2,4 +2,19 @@
 license: other
 license_name: yi-license
 license_link: LICENSE
 ---

 license: other
 license_name: yi-license
 license_link: LICENSE
+datasets:
+- adamo1139/rawrr_v1
+tags:
+- dpo
+- qlora
+- unsloth
 ---
+Another QLoRA DPO training of Yi-34B-200K.
+This time with sequence length 500, lora_r 16 and lora alpha 32.
+I was able to squeeze that in using Unsloth, script I used is in this repo.
+It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it.
+Will try to train this on AEZAKMI v2 now.
+Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)