The result is not good as Siglip
#1
by
lucasjin
- opened
Using llava to finetune, the result to wrose than siglip, this is unexpected, what's more, it actually can not get any Chinese OCR ability even with Chinese textvqa data.
Why.
Hello, thank you for your feedback. It might require a larger amount of data to demonstrate its advantages over SigLIP, as the few hundred thousand samples in LLaVA may be insufficient. Additionally, although ViT has learned to extract features of Chinese characters, performing well in Chinese OCR still requires the use of a large Chinese OCR dataset during the SFT stage.
czczup
changed discussion status to
closed