Edit model card

Another QLoRA DPO training of Yi-34B-200K. This time with sequence length 500, lora_r 16 and lora alpha 32. I was able to squeeze that in using Unsloth, script I used is in this repo. It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it. Will try to train this on AEZAKMI v2 now.

Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)

made with Unsloth

Downloads last month
568
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3