How to postprocess the coordinates coming from "Point to <something>"

#2
by hadim - opened

The output is an xml string similar to <points x1="9.9" y1="83.8" x2="20.9" y2="2.7" alt="something">something</points> but when trying to plot those coordinates to the original input image, the scaling and the position seems to be very off.

I tried scaling up using the default model image size (336, 336) but it does not work. Any idea?

You need to scale using the original image size of your image input

So the original image size is (3008, 2000) and the model default input image size is (336, 336) (according to config.vision_backbone["image_default_input_size"]).

So I tried

x_factor = image.size[0] / input_size[0]  # ~8.95
y_factor = image.size[1] / input_size[1]  # ~5.95 

but the scaling factors are still too small. I found manually that the correct ones are x_factor=30 and y_factor=19.5.

Am I missing something? Can you provide a snippet that compute the scaling factor?

Maybe @sanghol can chime in here?

Hi, our model generates pointing outputs to be easily rendered on images in HTML, e.g. in the format of <div class="dot" style="left: {x}%; top: {y}%;"></div>.
You need to divide x and y coordinates by 100 before multiplying by image width and height.
Thus, the actual location of point would be (x1, y1) = (297.792, 1676), (x2, y2) = (628.672, 54) ( I assumed that w=3008 and h=2000).

Thanks that works like a charm (maybe you should document that somewhere!).

hadim changed discussion status to closed

Sign up or log in to comment