Change of Perspective

The generic equation that relates the image points between two generic viewpoints can be written as

\begin{displaymath}
\begin{bmatrix}
u_2 \\ v_2 \\ 1
\end{bmatrix} \equiv
\mathb...
...n{bmatrix}
u_1 \\ v_1 \\ 1
\end{bmatrix} + \mathbf{t} \right)
\end{displaymath} (8.31)

where $\mathbf{t} = \mathbf{t}_1 - \mathbf{t}_2$ is the vector connecting the two pin-holes and $\mathbf{R}$ is the relative orientation between the two views as indicated in section 1.9. A more detailed discussion is provided in chapter 9 on stereoscopy.

In general, it is not possible to transform a view generated by one camera into the view generated by another. This is only feasible if one aims to correctly remap points on a specific plane or when the cameras share the same pin-hole.

The second case will be discussed in the next section. In the first case, it is possible to remap points from one view to another by utilizing a combination of a Perspective Mapping followed by an Inverse Perspective Mapping, under the assumption that the observed scene consists solely of a plane (for example, the ground). The image points are projected into world coordinates on a camera 1 and then reprojected back into image coordinates on a second camera 2 with different intrinsic and extrinsic parameters. Since a plane is always being reprojected, the composition of this transformation remains a homography:

\begin{displaymath}
\mathbf{H} = \mathbf{H}_{2} \cdot \mathbf{H}^{-1}_{1}
\end{displaymath} (8.32)

Homographic transformations indeed combine through simple matrix multiplication. Expanding equation (8.32) with (8.27) yields:
\begin{displaymath}
\mathbf{H} = \mathbf{K}_2 \cdot {\mathbf{R}_{Z}}_{2} \cdot {\mathbf{R}_{Z}}^{-1}_{1} \cdot \mathbf{K}^{-1}_{1}
\end{displaymath} (8.33)

From a theoretical standpoint, the necessity to enforce a constant plane $z$ only affects the situation if the translation vector changes. In cases where the translation vector is modified between the two views and there are points not belonging to the indicated plane, an incorrect remapping occurs between the two views (the homographic transformation is no longer respected). The transformation (8.32) can also be utilized to identify vertical obstacles within techniques such as Ground Plane Stereo and Motion Stereo.

This homographic matrix can be generalized by knowing the elements of the transformation between the two views $(\mathbf{R}, \mathbf{t})$ and the equation of the plane $(\mathbf{n}, d)$, where $\hat{\mathbf{n}}$ is the normal to the plane and $d$ is the distance from the first camera to the plane itself.

In this case, a point $\mathbf{x}_1$ from the first view that lies on the plane satisfies the equation

\begin{displaymath}
\hat{\mathbf{n}}^{\top} \mathbf{x}_1 = d
\end{displaymath} (8.34)

, and this point is related to the same point, but viewed from the second camera in accordance with the equation
\begin{displaymath}
\mathbf{x}_2 = \mathbf{R} \mathbf{x}_1 + \mathbf{t}
\end{displaymath} (8.35)

.

By combining these two equations, the homographic constraint is obtained:

\begin{displaymath}
\mathbf{H} = \mathbf{K}_2 \left( \mathbf{R} + \frac{1}{d} \mathbf{t} \hat{\mathbf{n}}^{\top} \right) \mathbf{K}^{-1}_{1}
\end{displaymath} (8.36)

.

A homography can always be decomposed into $\left[ \mathbf{R}, \frac{1}{d} \mathbf{t}, \hat{\mathbf{n}} \right]$ (there are 4 possible decompositions, and the one that satisfies the input points must be chosen).

Paolo medici
2025-10-22