Camera Coordinate Transformation

When discussing stereoscopic coordinates, it is essential to provide a brief introduction to coordinate transformations between camera systems. This section is, in fact, a continuation of the general discussion regarding sensors presented in section 1.9.

When the sensors involved are video sensors, the rotation matrices involved in the camera equations are matrices that convert from world coordinates to camera coordinates, rather than, as previously indicated, to sensor coordinates.

In the case of the pin-hole camera, the generic world point $\mathbf {x}$ is rotated and translated to the point $\prescript{i}{}{\mathbf{m}}$, expressed in the camera coordinates of the i-th sensor, through the relationship:

\begin{displaymath}
\prescript{i}{}{\mathbf{m}} = \prescript{i}{}{\mathbf{R}} \...
...thbf{R}} \mathbf{x} - \prescript{i}{}{\mathbf{R}} \mathbf{t}_i
\end{displaymath} (9.1)

which involves the rotation and permutation matrix $\prescript{i}{}{\mathbf{R}}$, expressed in the form of the pin-hole camera, that is, a matrix that converts from world coordinates to camera coordinates. The inverse of this transformation always exists and is given by
\begin{displaymath}
\mathbf{x} = \prescript{i}{}{\mathbf{R}}^{-1} \prescript{i}{}{\mathbf{m}} + \mathbf{t}_i
\end{displaymath} (9.2)

Therefore, given a point in camera coordinates $\prescript{1}{}{\mathbf{m}}$ observed in the reference frame of the first video sensor, this point is represented in the reference frame of the second camera as

\begin{displaymath}
\begin{array}{rl}
\prescript{2}{}{\mathbf{m}}&= \prescript{...
...1}{}{\mathbf{m}} + \prescript{2}{}{\mathbf{t}} \\
\end{array}\end{displaymath} (9.3)

having defined
\begin{displaymath}
\begin{array}{l}
\prescript{2}{}{\mathbf{R}}_1 = \prescript{...
...ript{2}{}{\mathbf{R}} (\mathbf{t}_2 - \mathbf{t}_1)
\end{array}\end{displaymath} (9.4)

with the matrices $\prescript{1}{}{\mathbf{R}}$ and $\prescript{2}{}{\mathbf{R}}$ still defined as in the pin-hole model. In this case, the matrix $\mathbf{R}$ is a matrix that converts a point from the camera coordinates of the first reference system to the camera coordinates of the second system.

The relationships connecting points between two sensors depend solely on their relative positioning. Consequently, the coordinates $\prescript{1}{}{\mathbf{m}}$ and $\prescript{2}{}{\mathbf{m}}$ of the same world point $\mathbf {x}$, observed by the two video sensors, must always satisfy the equation (9.3).



Subsections
Paolo medici
2025-10-22