Question

i have a Question about scale-invariant translation and log-space translation concept in R-CNN paper.

As mentioned in the paper:

Our goal is to learn a transformation that maps a proposed box P to a ground-truth box G.

so the regressor predict the four
[dx(P), dy(P), dw(P), dh(P)]
as transformation functions. and then we calculated the predicted ground-truth box Gˆ by applying the below transformations:

Gˆx = Pw * dx(P) + Px       (1)
Gˆy = Ph * dy(P) + Py        (2)
Gˆw = Pw * exp(dw(P))        (3)
Gˆh = Ph * exp(dh(P))        (4)

Question : i don’t understand equations (1) and (2). why the pw/ph is multiplied in the dx(p)/dy(p)? why we don’t applying px/py in the equations (3) and (4)?
in summarize i don’t understand the Pw and Ph roles in equations (1) and (2).
please guide me.

r-cnn Bounding-box regression

LEAVE A COMMENT Hủy