OCR on image

#28
by glitchyordis - opened

Obtaining key information is quite straightforward but Is there a way to obtain bbox locations from texts detected?

glitchyordis changed discussion title from OCR text to OCR on image

You can prompt the model to return bbox locations (see here: https://huggingface.co/spaces/maxiw/Qwen2-VL-Detection). I also tried "detect all texts" but the results are not super precise.

I tried OCR on a not-that-clear text screenshot, it's working nearly perfectly. But the model seems not good at recognize twisted text. E.g. words on bottle.

Sign up or log in to comment