Subsections


Implicit Calibration

The fundamental idea of the Direct Linear Transformation proposed by Abdel-Aziz and Karara (AAK71) allows for the direct calculation of the coefficients of the matrices (8.47), (8.50), or the matrix (8.15), completely disregarding the parameters and the structure of the perspective transformation model. This article also presents an approach to solve overdetermined problems using the Pseudoinverse technique.

Given the system (8.15), it is necessary to derive the 12 parameters of the projection matrix $\mathbf{P}$ to achieve an implicit calibration of the system, where the internal parameters (ranging from 9 to 11 depending on the model) that generated the elements of the matrix itself are unknown. This representation of the pin-hole camera is, of course, ideal (without non-linearities from the model).

The perspective function written in implicit form is

\begin{displaymath}
\begin{pmatrix}
u_i \\
v_i \\
1
\end{pmatrix} = \mat...
...x} \begin{pmatrix}
x_i \\
y_i \\
z_i \\
1
\end{pmatrix}\end{displaymath} (8.43)

where the elements $p_{0} \ldots p_{11}$ are arranged in row-major order. It is possible to rearrange the system (8.43) to obtain 2 pairs of linear constraints for each point for which both image and world coordinates are known:
\begin{displaymath}
\left[ \begin{array}{cccccccccccc}
x_i & y_i & z_i & 1 & 0...
...\begin{pmatrix}
p_0 \\
\vdots \\
p_{11}
\end{pmatrix} = 0
\end{displaymath} (8.44)

This technique is called DLT (direct linear transformation). Since each point provides 2 constraints, at least 6 points that are not linearly dependent, meaning they do not lie on the same plane or the same line, are required to obtain these 12 parameters.

Being a homogeneous system, its solution will be the null subspace of $\mathbb{R}^{12}$, the kernel of the matrix of known terms. For this reason, the matrix $\mathbf{P}$ is known up to a multiplicative factor, resulting in only 11 degrees of freedom (even fewer when considering that a modern camera typically has only 3-4 intrinsic parameters and 6 extrinsic parameters).

Having rearranged the system, the propagation of noise across the points is no longer linear, and this solution does not satisfy the maximum likelihood criterion. The matrix $\mathbf{P}$ obtained through this procedure, although it conceals the internal structure of the sensor, allows for the projection of a point from world coordinates to image coordinates and enables the derivation of the line that underlies such a point in the world from a point in image coordinates.

The result is generally unstable when using only 6 points; therefore, the estimation is typically performed by processing more points than the minimum required. Techniques such as the pseudoinverse are employed to determine a solution that minimizes measurement errors.

Generalization in the Case of Homogeneous Coordinates

The equation 8.43 can be generalized to the case of an "image" point in homogeneous coordinates $(u_i, v_i, w_i)$:
\begin{displaymath}
\begin{pmatrix}
u_i \\
v_i \\
w_i
\end{pmatrix} \equ...
...P} \begin{pmatrix}
x_i \\
y_i \\
z_i \\
1
\end{pmatrix}\end{displaymath} (8.45)

The problem is the same as previously encountered; the homogeneous solution exists, and the homogeneous resolutive equation (8.44) generalizes to

\begin{displaymath}
{\tiny\left[ \begin{array}{cccccccccccc}
w_i x_i & w_i y_i...
...begin{pmatrix}
p_0 \\
\vdots \\
p_{11}
\end{pmatrix} = 0}
\end{displaymath} (8.46)

for every $i$.

This formulation is useful when the projective model does not adhere to the pinhole model, but it is still possible to derive the "camera" coordinates of the optical rays corresponding to the pixel, which are therefore available in homogeneous format.

DLT Calculation of the Homography

Typically, to reduce the number of elements in the matrix $\mathbf{P}$, one can impose the constraint that all points involved in the calibration process lie on a specific plane (for example, the ground). This means setting the condition $z_{i}=0$ $\forall i$, which implies the elimination of a column (related to the axis $z$) from the matrix. which reduces to the size $3 \times 3$, becomes invertible, and can be defined as homographic (see section 1.10).

We therefore define the matrix $\mathbf{H} = \mathbf{P}_Z$ (see (8.27)) as

\begin{displaymath}
\lambda
\begin{pmatrix}
u_{i} \\
v_{i} \\
1
\end{pmatrix} = \mathbf{H} \begin{pmatrix}
x_{i} \\
y_{i} \\
1
\end{pmatrix}\end{displaymath} (8.47)

. As discussed in section 8.3, this matrix is very useful because it allows, among other things, to remove the perspective from the image, synthesizing a fronto-parallel view of the plane, through a transformation known as orthogonal rectification, bird eye view, or inverse perspective mapping. This transformation is applicable whether one aims to eliminate perspective (perspective mapping or inverse perspective mapping), to reproject a plane between two images (ground plane stereo), or to generate an image with different parameters (rectification, panoramic images) by utilizing a virtual plane.

As in the previous case, it is possible to transform the nonlinear relationship (8.47) in order to obtain linear constraints:

\begin{displaymath}
\left[
\begin{array}{ccccccccc}
x_i & y_i & 1 & 0 & 0 & 0 &...
...ight] \begin{pmatrix}
h_0 \\
\vdots \\
h_8
\end{pmatrix} = 0
\end{displaymath} (8.48)

Since this matrix is also defined up to a multiplicative factor, it has only 8 degrees of freedom, and therefore an additional constraint can be imposed.

If you have a sufficiently modern linear systems solver, the additional constraint $\vert\mathbf{H}\vert=1$ is automatically satisfied during the computation of the kernel of the matrix of known terms (QR factorization or SVD decomposition).

Another simpler and more intuitive method consists of imposing an additional constraint $h_{8}=1$: in this way, instead of solving a homogeneous system, one can solve a traditional linear problem. The system (8.47) can also be rearranged to obtain linear constraints in the form:

\begin{displaymath}
\left[
\begin{array}{cccccccc}
x_{i} & y_{i} & 1 & 0 & 0 & 0...
...7}
\end{pmatrix}=
\begin{pmatrix}
u_{i} \\
v_{i}
\end{pmatrix}\end{displaymath} (8.49)

. This is a (non-homogeneous) system of two equations in 8 unknowns $h_0 \ldots h_7 $, and each point, for which both the world position on a plane and the position in the image are known, provides 2 constraints.

However, imposing $h_{8}=1$ implies that the point $(0,0)$ cannot be a singularity of the image (e.g., the horizon line), and in general, it is not an optimal choice in terms of solution accuracy, as previously discussed.

It is important to note that the solution is heavily dependent on the chosen normalization. The choice $\vert H\vert=c$ can be referred to as standard least-squares.

In both cases, at least 4 points are required to obtain a homography $\mathbf{H}$, and each additional point allows for a solution with a lower error. These systems, when overdetermined, can be solved using the pseudoinverse method [*].

The matrix $\mathbf{H}$ is defined by 4 intrinsic parameters and 6 extrinsic parameters. The separation of intrinsic parameters from extrinsic parameters suggests that these parameters should be extracted independently to strengthen the calibration process. After all, intrinsic parameters can be determined with a certain degree of accuracy offline and are applicable to all possible camera placements (see also 8.5.4).

Let us define the matrix $\mathbf{R}_{Z}$ (see (8.28)) as

\begin{displaymath}
\lambda
\begin{pmatrix}
\tilde{u_{i}} \\
\tilde{v_{i}} \\
...
...hbf{R}_{Z}
\begin{pmatrix}
x_{i} \\
y_{i} \\
1
\end{pmatrix}\end{displaymath} (8.50)

, where $(\tilde{u_{i}},\tilde{v_{i}})$ denotes the so-called normalized image coordinates (homogeneous coordinates of the point $(\tilde{x}_i,\tilde{y}_i,\tilde{z}_i)^{\top}$ in camera coordinates).

The matrix $\mathbf{H}$ is defined up to a scaling factor, while $\mathbf{R}_{Z}$ allows for the definition of the scale since it still has two orthonormal columns. The knowledge of the two columns of the rotation matrix enables the derivation of the third column, and therefore this calibration becomes valid for points even outside the plane $z=0$.

As done previously, a non-linear system consisting of 3 homogeneous equations, when appropriately rearranged, yields two linear constraints: (Abdel-Aziz and Karara (AAK71)). It is therefore possible to construct a system of $2 \times N$ equations for all $N$ control points, in order to solve for the 9 unknowns. The matrix is defined up to a multiplicative factor, but in this case, the internal structure of the matrix $\mathbf{R}_{Z}$ can be helpful in deriving the extrinsic parameters (see section 8.5.3). In fact, the two columns of the matrix must be orthonormal:

\begin{displaymath}
\begin{array}{rl}
r^{2}_{0} + r^{2}_{3} + r^{2}_{6} & = 1 ...
...\
r_{0}r_{1} + r_{3}r_{4} + r_{6}r_{7} & = 0 \\
\end{array}\end{displaymath} (8.51)

These additional nonlinear constraints arise from the fact that such a matrix is explicitly defined by only 6 parameters (3 rotations and the translation).

Geometric Representation

The equations (8.44) and (8.48) can also be derived from purely geometric considerations since the image and camera vectors must be parallel (the factor $lamba_i$ is purely multiplicative and at most affects the vector through an affine transformation):

\begin{displaymath}
\mathbf{p} \times \mathbf{P} \mathbf{x} = \mathbf{0}
\qquad
\mathbf{m}' \times \mathbf{H} \mathbf{m} = \mathbf{0}
\end{displaymath} (8.52)

This compact formulation is what is commonly referred to as DLT (HZ04) and applies to all those linear transformations known up to a multiplicative factor to transform this problem into a homogeneous problem.

Paolo medici
2025-10-22