Calibration using the Sturm-Maybank-Zhang method

Zhang (Zha99) and simultaneously Sturm and Maybank (SM99) identify a method to derive a linear equation for obtaining the camera parameters, while also updating the calibration techniques (which remain valid but are now somewhat outdated, dating back to the 1980s) primarily developed by Tsai (Tsa87) and others (WM94).

This technique leverages the computation of various homographic matrices $\mathbf{H}$ obtained from the observation of a plane (for example, a calibration grid with equidistant markers) and seeks to explicitly derive the intrinsic parameters of the camera from these. As previously discussed, the matrix $\mathbf{H}$, the homographic transformation of a plane, possesses 8 degrees of freedom, but it is not possible to directly derive the 10 explicit parameters that generated it. Methods for obtaining the homographic matrix given correspondences between image points and points on the plane are discussed in section 8.5.1.

The matrix $\mathbf{H}$ and in particular the equation (8.27) can be expressed as

\begin{displaymath}
\mathbf{H} = \begin{bmatrix}
\mathbf{h}_1 & \mathbf {h}_2 ...
...atrix}
\mathbf{r}_1 & \mathbf {r}_2 & \mathbf{t} \end{bmatrix}\end{displaymath} (8.64)

where $\lambda$ is indicated to highlight the presence of an unknown multiplicative factor in the calculation of the homographic matrix. Let us focus on the part of the rotation matrix formed by the column vectors $\mathbf{r}_1$ and $\mathbf{r}_2$, which are orthonormal to each other.

Despite the presence of the factor $\lambda$, it is indeed possible to express relationships based on the orthogonality between the vectors $\mathbf{r}_1$ and $\mathbf{r}_2$ in order to enforce the following two constraints:

\begin{displaymath}
\begin{array}{l}
\mathbf{h}_1^{\top} \mathbf{W} \mathbf{h}...
...h}_1 = \mathbf{h}_2^{\top} \mathbf{W} \mathbf{h}_2
\end{array}\end{displaymath} (8.65)

having defined $\mathbf{W}$, neglecting the skew for simplicity, as
\begin{displaymath}
\mathbf{W} = (\mathbf{K}^{-1})^{\top} \mathbf{K}^{-1} = \beg...
...dfrac{u_0^2}{k_u^2} + \dfrac{v_0^2}{k_v^2} + 1\\
\end{bmatrix}\end{displaymath} (8.66)

a symmetric matrix. This equation represents the equation of a conic and is indeed the equation of the "absolute conic" (LF97).

The 4 (or 5 unknowns, not neglecting the skew) of the matrix $\mathbf{W}$ under the 2 constraints (8.65) can be solved using at least 2 (or 3) different planes, that is, matrices $\mathbf{H}$ whose columns are not linearly dependent on each other.

Once the matrix $\mathbf{W}$ is obtained, the original matrix can be determined using the Cholesky decomposition. Alternatively, Zhang provides the equations to directly obtain the intrinsic parameters of the camera from the matrix $\mathbf{W}$. It is indeed possible to transform $\mathbf{h}_i^{\top} \mathbf{W} \mathbf{h}_j = \mathbf{v}_{ij}^{\top} \mathbf{w}$ using appropriate values of the vector $\mathbf{v}_{ij}$ and with $\mathbf{w}$, a vector to be determined, using the non-zero values of the upper triangular matrix of $\mathbf{W}$. In this way, the system of equations (8.65) is transformed into the solution of a homogeneous linear system in $\mathbf{w}$.

Once the intrinsic parameters and the matrix $\mathbf{K}$ are determined, for each homographic matrix $\mathbf{H}$ used in the optimization phase, it is possible to estimate the rotation and translation:

\begin{displaymath}
\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t} \end{bmatrix} = \lambda \mathbf{K}^{-1}\mathbf{H}
\end{displaymath} (8.67)

The columns $\mathbf{r}_1$ and $\mathbf{r}_2$ are typically sufficient to derive the rotation angles. From each grid, it is possible to derive all the extrinsic parameters and measure the reprojection error in this way.

The system as a whole is still ill-conditioned, and it is challenging to arrive at a stable solution after repeated trials. However, the values obtained through this linear technique serve as a starting point in a phase of Maximum Likelihood Estimation to minimize reprojection errors (section 8.5.6).

One note: Zhang in his article equates the Principal Point with the distortion center, which is generally inaccurate.

Paolo medici
2025-10-22