Maximum Likelihood Estimation

When using SVD decomposition to strengthen the constraints, the resulting Fundamental (or Essential) matrix fully meets the requirements to be considered Fundamental (or Essential). However, it is merely a matrix that is more similar under a specific norm (in this case, Frobenius) to the one obtained from the linear system.

Therefore, this solution is not optimal either, as it does not account for how the error propagates from the input points within the transformation: it is still fundamentally an algebraic solution rather than a geometric one.

A preliminary technique that minimizes geometric error involves leveraging the distance between points and the epipolar lines generated through the Fundamental matrix (epipolar distance).

Even intuitively, the distance between a point $\mathbf {p}_2$ and the epipolar line $\mathbf{F}\mathbf{p}_1$ can be used as a metric to estimate the geometric error:

\begin{displaymath}
d \left( \mathbf{p}_{2}, \mathbf{F} \mathbf{p}_{1} \right) ...
...right)_1^{2} + \left( \mathbf{F}\mathbf{p}_1 \right)_2^{2} } }
\end{displaymath} (9.62)

where $(.)_i$ denotes the i-th component of the vector (see section 1.5.3 for the equation of the point-line distance). The lower the distance, the more the matrix $\mathbf{F}$ effectively serves as the matrix that relates the corresponding points.

Since it is possible to compute this error for both the first and the second image, it is appropriate to minimize both contributions together. Through this metric, it is possible to define a cost function that minimizes the error symmetrically (symmetric transfer error) between the two images:

\begin{displaymath}
\min_\mathbf{F} \sum_i \left( d \left( \mathbf{p}_{1,i} , \...
...}_{2,i} , \mathbf{F}^{\top} \mathbf{p}_{1,i} \right)^2 \right)
\end{displaymath} (9.63)

In this case as well, one can seek a solution with 8 unknowns, but to find a robust solution, it is necessary to constrain $\mathbf{F}$ to have rank 2.

Alternatively to the Symmetric Transfer Error, the first-order approximation of the distance between the points and the function is often used in the literature (Sampson error, section 3.3.7). It is possible to define an approximate distance between the homologous image points $\left( \mathbf{p}_{1}, \mathbf{p}_{2} \right)$ and the variety $\hat{\mathbf{p}_{2}}^{\top}\mathbf{F} \hat{\mathbf{p}_{1} }=0$ through the metric

\begin{displaymath}
r\left( \mathbf{p}_{1}, \mathbf{p}_{2}, \mathbf{F} \right) ...
... \mathbf{p}_2)_1^2 + (\mathbf{F}^{\top} \mathbf{p}_2)_2^2 } }
\end{displaymath} (9.64)

where $(.)_i$ again denotes the i-th component of the vector. Using this approximate metric, while always maintaining the additional constraint $\det \mathbf{F} = 0$, it is possible to minimize
\begin{displaymath}
\min_{\mathbf{F}} \sum_{i=1}^{n} r\left( \mathbf{p}_{1,i}, \mathbf{p}_{2,i}, \mathbf{F} \right)^{2}
\end{displaymath} (9.65)

Both the Symmetric Transfer Error and the Sampson distance, although superior metrics compared to the algebraic estimate, do not yield the optimal estimator. The Maximum Likelihood Estimation (MLE) for the Fundamental matrix would be obtained by using a cost function of the form

\begin{displaymath}
\min_\mathbf{F} \sum_i \Vert \mathbf{p}_{1,i} - \hat{\mathb...
...rt^2 + \Vert \mathbf{p}_{2,i} - \hat{\mathbf{p}}_{2,i} \Vert^2
\end{displaymath} (9.66)

denoting by $\hat{\mathbf{p}}_{1,i}$ and $\hat{\mathbf{p}}_{2,i}$ the exact points and by $\mathbf{p}_{1,i}$, $\mathbf{p}_{2,i}$ the corresponding measured points affected by zero-mean white Gaussian noise. The cost function (9.66) needs to be minimized under the constraint
\begin{displaymath}
\hat{\mathbf{p}}^{\top}_{2,i} \mathbf{ F } \hat{\mathbf{p}}_{1,i} = 0
\end{displaymath} (9.67)

and with additional constraints due to the nature of $\mathbf{F}$. In this case, the exact points $\hat{\mathbf{p}}_{1,i}$ and $\hat{\mathbf{p}}_{2,i}$ become part of the problem (auxiliary variables, subsidiary variables). However, introducing the points $\hat{\mathbf{p}}_{1,i}$ and $\hat{\mathbf{p}}_{2,i}$ as unknowns makes the problem unsolvable, as there would always be more unknowns than constraints.

To solve this problem, it is necessary to combine the issue of calculating the Essential or Fundamental matrix with that of three-dimensional reconstruction, and to set the three-dimensional coordinate of the observed point $\hat{\mathbf{x}}_{i}$ as the auxiliary variable, rather than the projections.

The Essential matrix can be obtained given the knowledge of the intrinsic parameters of the two sensors. In this case, it is indeed possible to exploit the nonlinear system that projects the auxiliary variable $\hat{\mathbf{x}}_{i}$ onto the respective observations from the two sensors:

\begin{displaymath}
\begin{array}{l}
\hat{\mathbf{p}}_{1,i} \equiv \mathbf{K}_...
...f{R} \hat{\mathbf{x}}_{i} + \mathbf{t} \right) \\
\end{array}\end{displaymath} (9.68)

where the matrix $\mathbf{R}$ can be expressed through a parameterization with 3 variables (see section A), while the vector $\mathbf{t}$ must be represented through a parameterization with 2 (the scale remains an unknown factor). By inserting the constraints (9.68) into equation (9.66), the objective of deriving the Essential matrix is transformed into that of directly obtaining the parameters relating to the two sensors. Finally, if required, once the relative pose between the sensors is obtained, it is possible to derive the Essential matrix by directly applying the definition (9.41).

When intrinsic parameters are not available, in the case of estimating the Fundamental matrix, it is not possible to perform a true three-dimensional reconstruction of the scene due to the lack of these parameters. However, it is possible to exploit fictitious perspective projections by setting $\mathbf{K}_1=\mathbf{I}$ and obtaining constraints of the form:

\begin{displaymath}
\begin{array}{l}
\hat{\mathbf{p}}_{1,i} \equiv \hat{\mathb...
...}_{2,i} \equiv \mathbf{P} \hat{\mathbf{x}}_{i} \\
\end{array}\end{displaymath} (9.69)

using the auxiliary variable $\hat{\mathbf{x}}_{i}$ directly, a coordinate that will therefore be known up to an affine transformation $\mathbf{K}_1^{-1}$, namely the intrinsic parameters of camera 1.

By incorporating the constraints (9.69) into the equation (9.66), the objective of deriving the Fundamental matrix is once again transformed into that of extracting the parameters of the projective matrix $\mathbf{P}$. Through the camera matrix $\mathbf{P}=\left[ \mathbf{R}' \vert \mathbf{t}' \right]$, a fictitious camera matrix, it is finally possible to derive $\mathbf{F}$ by directly applying the definition (9.41), where, however, the matrix $\mathbf{R}'$ is not a rotation matrix.

The maximum likelihood estimation of the fundamental matrix, corrected from a probabilistic standpoint, nonetheless requires a substantial amount of resources: in addition to the 12 global unknowns necessary to estimate $\mathbf{P}$ (compared to the 5 of the essential matrix), for each pair of points to be minimized, 3 additional unknowns are incorporated into the problem.

Finally, as a final warning, for the optimal estimation of matrices in the presence of potential outliers in the scene, techniques such as RANSAC are widely employed (see section 3.12).

Paolo medici
2025-10-22