Three-Dimensional Reconstruction and Homography

The equation (9.21) can be easily expressed in homogeneous form. The matrix that allows the reconstruction of the three-dimensional point coordinates from image-disparity coordinates directly in the camera reference frame is

\begin{displaymath}
\begin{bmatrix}
\tilde{x} \\ \tilde{y} \\ \tilde{z} \\ 1
\...
...x} = \mathbf{Q} \begin{bmatrix}
u \\ v \\ d \\ 1
\end{bmatrix}\end{displaymath} (9.22)

while its inverse
\begin{displaymath}
\begin{bmatrix}
u \\ v \\ d \\ 1
\end{bmatrix} =
\begin{b...
...matrix}
\tilde{x} \\ \tilde{y} \\ \tilde{z} \\ 1
\end{bmatrix}\end{displaymath} (9.23)

is the matrix that enables the projection of a point from camera coordinates to image-disparity coordinates (these are matrices known up to a multiplicative factor, hence they can be expressed in various forms). The three-dimensional reconstruction of the image-disparity point in the world reference frame, as given by equation (9.17), is equivalent. The matrix $\mathbf{Q}$ is referred to as the reprojection matrix (FK08).

In real conditions, since the camera is rotated and translated with respect to the ideal conditions, it is sufficient to multiply the matrix $\mathbf{Q}$ by the matrix $4 \times 4$, which represents the transformation from camera coordinates to world coordinates, in order to obtain a new matrix that allows the conversion from disparity coordinates to world coordinates and vice versa.

The use of this formalism allows for the transformation of disparity points acquired from pairs of cameras positioned at different viewpoints (for example, a stereo pair that moves over time or two stereo pairs rigidly connected to each other). In this case, the relationship that links disparity points acquired from the two viewpoints is also represented by a matrix $4 \times 4$:


\begin{displaymath}
\mathbf{H}_{2,1} = \mathbf{Q}_1^{-1} \begin{bmatrix}
\math...
...{1}{}{\mathbf{t}}_{2,1} \\
0 & 1
\end{bmatrix} \mathbf{Q}_2
\end{displaymath} (9.24)

which enables the transformation of $(u_2,v_2,d_2)$ into $(u_1,v_1,d_1)$ (this is a homographic transformation in 4 dimensions, quite similar to those in 3 dimensions discussed so far). It is noteworthy that we have used as pose $\left( \mathbf{R}, \mathbf{t} \right)$ the syntax of equation (1.64), aiming to express the point from reference frame 2 in reference frame 1. Since all the points involved are expressed in camera coordinates, if there are transformations between sensors expressed in world coordinates, as is typically the case, it is necessary to include the change of reference frame. This class of transformations is commonly referred to as 3D Homographies.

Paolo medici
2025-10-22