--- license: apache-2.0 --- **Base Model**: BLIP2-t5 pretrained version **Finetune data**: LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) **Hyper-parameters**: v0: * lr = 2e-5 --> 0.0 with cosine lr scheduler * gbs = 32 * image size = 480 * weight decay = 0.05 v1 (same as LLAVA): * lr = 2e-5 * gbs = 32 * image size = 480 * weight decay = 0.0