How to postprocess the coordinates coming from "Point to <something>"

by hadim - opened 14 days ago

14 days ago

The output is an xml string similar to <points x1="9.9" y1="83.8" x2="20.9" y2="2.7" alt="something">something</points> but when trying to plot those coordinates to the original input image, the scaling and the position seems to be very off.

I tried scaling up using the default model image size (336, 336) but it does not work. Any idea?

Muennighoff

Ai2 org 14 days ago

You need to scale using the original image size of your image input

hadim

14 days ago

So the original image size is (3008, 2000) and the model default input image size is (336, 336) (according to config.vision_backbone["image_default_input_size"]).

So I tried

x_factor = image.size[0] / input_size[0]  # ~8.95
y_factor = image.size[1] / input_size[1]  # ~5.95

but the scaling factors are still too small. I found manually that the correct ones are x_factor=30 and y_factor=19.5.

Am I missing something? Can you provide a snippet that compute the scaling factor?

Muennighoff

Ai2 org 14 days ago

Maybe @sanghol can chime in here?

sanghol

Ai2 org 14 days ago

Hi, our model generates pointing outputs to be easily rendered on images in HTML, e.g. in the format of <div class="dot" style="left: {x}%; top: {y}%;"></div>.
You need to divide x and y coordinates by 100 before multiplying by image width and height.
Thus, the actual location of point would be (x1, y1) = (297.792, 1676), (x2, y2) = (628.672, 54) ( I assumed that w=3008 and h=2000).

hadim

14 days ago

Thanks that works like a charm (maybe you should document that somewhere!).

hadim changed discussion status to closed 14 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment