Subsections

Aligned Chambers

In the case of perfectly aligned cameras with respect to the axes and having identical intrinsic parameters (same focal length and same principal point), the equations for three-dimensional reconstruction simplify significantly.

In this condition, the equations of perspective projection reduce to

\begin{displaymath}
\begin{array}{l}
u_i = - k_{u} \dfrac{ y - y_i }{ x - x_i ...
...= - k_{v} \dfrac{ z - z_i }{ x - x_i } + v_{0} \\
\end{array}\end{displaymath} (9.15)

with $(x,y,z)$ being a point in "world" coordinates (see the next section) and $(u_i,v_i)$ the coordinates of the point projected onto the i-th image. The point $(u_0,v_0)$ is the principal point, which must be the same for all the cameras involved, and it is assumed that the cameras are all perfectly aligned with the coordinate axes.

Let us now focus solely on the stereoscopic case: for simplicity, we will denote the left camera with the subscript 1 and the right camera with 2. The alignment constraints impose $x_1=x_2=0$, $y_1=b$, $y_2=0$, and $z_1=z_2=0$, having placed, without loss of generality, the right camera at the center of the reference system. The quantity $b = y_1 - y_2$ is defined as the baseline.

The difference $d = u_1 - u_2$ in the horizontal coordinates of the projections of the same point as viewed in the two images of the stereo pair is defined as disparity. This value is obtained by incorporating the alignment constraints into equation (9.15), resulting in

\begin{displaymath}
u_1 - u_2 = d = k_u \frac{ b }{ x }
\end{displaymath} (9.16)

By inverting this simple relation and substituting it into equation (9.15), it is possible to derive the world coordinates $(x,y,z)$ corresponding to a point $(u_2,v_2)$ in the right camera with disparity $d$:

\begin{displaymath}
\begin{array}{l}
x = k_u \dfrac{b}{d} \\
y = - (u_2 - u_0...
...
z = - (v - v_0) \dfrac{k_u}{k_v} \dfrac{b}{d} \\
\end{array}\end{displaymath} (9.17)

It is clear that it must be $d \geq 0$ for world points located in front of the stereo pair.

As can be observed, each element is determined by the multiplicative factor $b$ of the baseline, the true scaling factor of the reconstruction, and by the inverse of the disparity $1/d$.

World Coordinate Triangulation

The coordinates $(x,y,z)$ obtained are sensor coordinates, referring to a specific stereoscopic configuration where orientation and positioning are aligned and coincide with the axes of the system. To transition from sensor coordinates to the generic case of world coordinates, with arbitrarily oriented cameras, a transformation must be applied that converts the coordinates from sensor to world, specifically the rotation matrix $\prescript{w}{}{\mathbf{R}}_{b}$ and the translation $(x_i,y_i,z_i)^{\top}$ of the pin-hole coordinate, allowing us to express


\begin{displaymath}
\begin{bmatrix}
x \\ y \\ z
\end{bmatrix} = \prescript{w...
...nd{bmatrix} + \begin{bmatrix}
x_i \\ y_i \\ z_i
\end{bmatrix}\end{displaymath} (9.18)

By combining equation (9.17) with equation (9.18), it is possible to define a matrix $\mathbf{M}$ such that the conversion between image point-disparity $(u_i,v,d)$ and world coordinate $(x,y,z)$ can be expressed in a very compact form as

\begin{displaymath}
\begin{bmatrix}
x \\ y \\ z
\end{bmatrix} = \frac{1}{d} ...
...nd{bmatrix} + \begin{bmatrix}
x_i \\ y_i \\ z_i
\end{bmatrix}\end{displaymath} (9.19)

where i can represent either the left or the right camera interchangeably.

Paolo medici
2025-10-22