The result is not good as Siglip

by lucasjin - opened Jun 5

Jun 5

Using llava to finetune, the result to wrose than siglip, this is unexpected, what's more, it actually can not get any Chinese OCR ability even with Chinese textvqa data.
Why.

czczup

OpenGVLab org Aug 22

Hello, thank you for your feedback. It might require a larger amount of data to demonstrate its advantages over SigLIP, as the few hundred thousand samples in LLaVA may be insufficient. Additionally, although ViT has learned to extract features of Chinese characters, performing well in Chinese OCR still requires the use of a large Chinese OCR dataset during the SFT stage.

czczup changed discussion status to closed Aug 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment