Essential Matrix and Fundamental Matrix

In 1981, Christopher Longuet-Higgins (Lon81) was the first to observe that a generic point expressed in world coordinates, the corresponding points in camera coordinates, and the pin-hole must be coplanar. The geometric derivation of the relationships among the points is omitted, but the analytical presentation is provided directly.

It has been repeatedly stated that a point in an image subtends a line in the world, and the line in the world projected onto another image, captured from a different viewpoint, represents the epipolar line where the corresponding point of the first image lies. This equation, which relates points in one image to lines in the other, can be expressed in a matrix form.

To follow Higgins' reasoning, the intrinsic parameter matrix will be implicit, and the coordinates used will be those of the normalized camera.

Without loss of generality, consider a system consisting of two cameras, the first positioned and oriented with respect to the second with projection matrix $\mathbf{P}_1=[\mathbf{R}\vert\mathbf{t}]$ , while the second is placed at the origin of the reference system aligned with the axes, that is, with projection matrix $\mathbf{P}_2=[\mathbf{I}\vert\mathbf{0}]$ : one can arrive at the same result starting from two generic calibrated cameras, arbitrarily oriented and positioned with respect to a third system, through the relations $\mathbf{R} = \prescript{2}{}{\mathbf{R}}_{1} = \mathbf{R}^{-1}_{2} \mathbf{R}_{1}$ and $\mathbf{t}$ , which represent the position of camera 1 with respect to system 2.

A generic point $\mathbf{x} \in \mathbb{R}^3$ has coordinates $\mathbf{x}_1$ and $\mathbf{x}_2$ in the two different reference systems and is projected onto sensors 1 and 2 at the points with camera coordinates $\mathbf{m}_1$ and $\mathbf{m}_2$ , respectively.

These image points are known to span a subspace of $\mathbb{R}^3$ defined by the equation, for instance, $\lambda \mathbf{m}_2$ , which passes through the pin-hole of the second sensor (here set to be at $\mathbf{0}$ ), namely

$\begin{displaymath} \mathbf{x}_2 = \lambda_2 \mathbf{m}_2 + \mathbf{0} \end{displaymath}$

(9.36)

A generic point $\mathbf{x}_1 = \lambda_1 \mathbf{m}_1$ expressed in coordinates of sensor 1 and observed by that sensor can be projected into coordinates of sensor 2 according to the equation

$\begin{displaymath} \mathbf{x}_2 = \prescript{2}{}{ \mathbf{R}_1 } \mathbf{x}_1 + \mathbf{t} \end{displaymath}$

(9.37)

The equation of the epipolar line, a line in camera coordinates of the second sensor and the locus of points where $\mathbf{m}_2$ must lie, associated with the point $\mathbf{m}_1$ (observed and therefore expressed in camera coordinates of the first sensor), is given by

$\begin{displaymath} \lambda_2 \mathbf{m}_2 = \lambda_1 \prescript{2}{}{ \mathbf{R}_1 } \mathbf{m}_1 + \mathbf{t} \end{displaymath}$

(9.38)

The locus of homogeneous points $\mathbf{m}_2$ is obtained by varying the parameter $\lambda_1$ , and this line in $\mathbb{R}^3$ remains a line even in $\mathbb{R}^2$ . If two points are indeed homologous, the system is solvable, and it is possible to derive the parameters $\lambda_1$ and $\lambda_2$ (this is an example of three-dimensional reconstruction through triangulation, as discussed in section 9.3.1).

However, there exists a relationship that connects the points of the two cameras by eliminating the parameters $\lambda$ , but more importantly, it allows for the reverse reasoning, that is, to derive the relative pose between the two cameras $\left( \mathbf{R}, \mathbf{t} \right)$ given a list of corresponding points.

If both sides of the equation (9.38) are first multiplied vectorially by $\mathbf{t}$ , and then scalarly by $\mathbf{m}^{\top}_{2}$ , one obtains

$\begin{displaymath} \lambda_2 \mathbf{m}^{\top}_{2} \left( \mathbf{t} \times \ma... ...thbf{m}^{\top}_{2} \left( \mathbf{t} \times \mathbf{t} \right) \end{displaymath}$

(9.39)

. On this relation, it is possible to apply the properties of the vector product $\mathbf{t} \times \mathbf{t}=\textbf{0}$ and the scalar product $\mathbf{m}_2 \cdot \left( \mathbf{t} \times \mathbf{m}_2 \right)=0$ .

This passage has a physical significance: first of all, the coplanarity constraints (all expressed, for example, in reference 2) are introduced among the points $\mathbf{0}$ (the pinhole of camera 2), $\mathbf{m}_2$ , $\prescript{2}{}{ \mathbf{R}_1 } \mathbf{m}_1 + \mathbf{t}$ , $\mathbf{x}_2 = \prescript{2}{}{ \mathbf{R}_1 } \mathbf{x}_1 + \mathbf{t}$ , and $\mathbf{t}$ (the pinhole of camera 1 in system 2), along with the fact that the body is rigid.

Through this formula, it is possible to express the relationships between the corresponding points $\mathbf{m}_1$ and $\mathbf{m}_2$ , represented in the form of homogeneous camera coordinates, in a very compact form

$\begin{displaymath} \mathbf{m}^{\top}_{2} \left( \mathbf{t} \times \mathbf{ R } \mathbf{m}_1 \right) = 0 \end{displaymath}$

(9.40)

Finally, denoting with $[\mathbf{t}]_{\times}$ , the antisymmetric matrix, the vector product in matrix form (see section 1.7), it is possible to collect the various contributions in the form of a matrix

$\begin{displaymath} \mathbf{E} = [\mathbf{t}]_{\times} \mathbf{R} = \mathbf{R} \left[ \mathbf{R}^{\top} \mathbf{t} \right]_{\times} \end{displaymath}$

(9.41)

. It is important to recall the significance of matrices by comparing them with equation (9.37). In this way, a linear relationship can be defined that connects the camera points of the two views:

$\begin{displaymath} \mathbf{m}^{\top}_{2} \mathbf{ E } \mathbf{m}_1 = 0 \end{displaymath}$

(9.42)

The matrix $\mathbf{E}$ is defined as the Essential Matrix.

Finally, one must pay close attention to the indices because there is no unique convention for indicating points 1 and 2: assuming the convention in (9.41) is satisfied, what needs to be remembered is that matrix $\mathbf{E}$ encodes the relative pose of the camera of the points on the right (in our case $\mathbf{m}_1$ ) with respect to the camera of the points on the left (in our case $\mathbf{m}_2$ ) of the matrix.

The matrix $\mathbf{E}$ , relating homogeneous points, is also homogeneous and therefore defined up to a multiplicative factor.

The Essential matrix has the following properties:

the transpose of the Essential matrix for the ordered pair of cameras (1,2) is the Essential matrix for the pair (2,1);
$\mathbf{E}$ is a rank 2 matrix with 5 degrees of freedom (it indeed represents a relative pose, thus 3 angles and the direction between the epipoles, which accounts for 2 degrees of freedom);
The two singular values of the matrix $\mathbf{E}$ must be equal, and the third must be zero.

The Essential matrix establishes relationships in camera coordinates and, therefore, to utilize it from a practical standpoint, it is necessary to have points expressed in this particular reference system. In other words, it is essential to know the intrinsic parameters of the cameras involved.

The equation

$\begin{displaymath} \mathbf{m}^{\top}_{2} \left( \mathbf{ E } \mathbf{m}_1 \right) = 0 \end{displaymath}$

(9.43)

can also be interpreted as the equation of a plane in space $2$

that passes through $\mathbf{0}$ , specifically the epipolar plane formed by the two epipoles and the world point, a plane to which the point $\mathbf{m}_{2}$ must belong.

It is indeed possible to introduce an additional relationship between the points of the images, completely disregarding the intrinsic parameters of the cameras themselves.

If we apply the definition of homogeneous camera coordinates $\mathbf{p} = \mathbf{K} \mathbf{m}$ in relation (9.42), we obtain

$\begin{displaymath} \mathbf{m}^{\top}_2 \mathbf{E} \mathbf{m}_1 = \mathbf{p}^{\t... ...1} \mathbf{p}_1 = \mathbf{p}^{\top}_2 \mathbf{F} \mathbf{p}_1 \end{displaymath}$

(9.44)

The Fundamental matrix is defined (Faugeras and Hartley, 1992) as:

$\begin{displaymath} \mathbf{p}^{\top}_{2} \mathbf{ F } \mathbf{p}_1 = 0 \end{displaymath}$

(9.45)

where $\mathbf {p}_1$ and $\mathbf {p}_2$ are the homogeneous coordinates of corresponding points in the first and second images, respectively.

If two points on the two images of the stereoscopic pair represent the same point in the world, the equation (9.45) must be satisfied.

The fundamental matrix allows us to narrow down the search range for correspondences between the two images because, due to the point-line duality, from the relationship (9.45), we can specify the location of points in the second image where we should search for points from the first image.

Indeed, the equation of a line where the points $\mathbf{m_2}$ and $\mathbf{m_1}$ must lie is described by

$\begin{displaymath} \begin{array}{l} \mathbf{l}_2 = \mathbf{F} \mathbf{m}_1 \\ \mathbf{l}_1 = \mathbf{F}^{\top} \mathbf{m}_2 \\ \end{array}\end{displaymath}$

(9.46)

, where $\mathbf{l}_1$ and $\mathbf{l}_2$ are the parameters of the epipolar line, belonging to the first and second images respectively, expressed in implicit form.

The relationship between the Fundamental matrix and the Essential matrix is given by equation (9.44),

$\begin{displaymath} \mathbf{E} = \mathbf{K}^{\top}_{2} \mathbf{F} \mathbf{K}_{1} \end{displaymath}$

(9.47)

or vice versa. It seems that you've entered a placeholder for a mathematical block. Please provide the specific content or equations you would like to have translated, and I will assist you accordingly.

The Essential matrix collects in $\mathbf{E}$ the relative poses between the cameras, while the Fundamental matrix conceals both the intrinsic parameters and the relative pose.

The Essential matrix introduces constraints that are equivalent to those of the Fundamental matrix; however, although it was historically introduced before the Fundamental matrix, it is a special case because it expresses the relationships with respect to camera coordinates.

$\mathbf{F}$ is a matrix $3 \times 3$ of rank 2, and it can be determined with just 7 points, as the degrees of freedom amount to exactly 7 (a multiplicative factor and the zero determinant reduce the dimensionality of the problem). The relationship that connects the Fundamental matrix to the 7 degrees of freedom is a nonlinear relationship (one that is not easily expressible through any algebraic representation). With (at least) 8 points, however, it is possible to obtain a linear estimate of the matrix, as described in the following section.

The Fundamental matrix has the following properties:

the transpose of the Fundamental matrix of the ordered pair of cameras (1,2) is the Fundamental matrix of the pair (2,1);
$\mathbf{F}$ is a rank 2 matrix with 7 degrees of freedom (the homogeneous matrix $\mathbf{F}$ has 8 degrees of freedom, to which the constraint $\det \mathbf{F} = 0$ is added);
$\mathbf{l}_2 = \mathbf{F} \mathbf{p}_1$ and $\mathbf{l}_1 = \mathbf{F}^{\top} \mathbf{p}_2$ are the epipolar lines in image 2 of a point from image 1 and in image 1 of a point from image 2, respectively;
since the epipoles must satisfy the relations $\mathbf{F}\mathbf{e}_1=0$ and $\mathbf{F}^{\top}\mathbf{e}_2=0$ , respectively, it follows that they are the "left" and "right" kernels of the matrix $\mathbf{F}$ ;
$\mathbf{F}$ is a "quasi-correlation," meaning it is a transformation that converts points into lines but is not invertible.

**Figure 9.3:** The Fundamental matrix allows for the identification of epipolar lines, shown in the right image, on which the corresponding points of the left image reside.

The Fundamental and Essential matrices can be used to narrow down the search space for corresponding points between two images and/or filter out potential outliers (for example, in RANSAC). The Essential matrix, when decomposed, allows for the extraction of the relative pose between the two cameras and, as such, provides an approximate idea of the motion experienced by a camera moving through the world (motion stereo) or the relative pose of two cameras in a stereoscopic pair (Auto-Calibration).

The use of the Essential matrix allows for the derivation of the relative pose between two views. However, it is not possible to determine the length of the baseline connecting the two pin-holes, but only its direction. Nevertheless, with the Essential matrix at hand, it is always possible to perform a three-dimensional reconstruction of the observed scene up to a multiplicative factor: the ratios between distances are known, but not their absolute values.

This, however, enables a coherent three-dimensional reconstruction when observing the same scene from more than two different views, where the unknown multiplicative factor remains consistent across all views, thus allowing the merging of all individual reconstructions into a single reconstruction known up to the same scale factor.

Subsections

Paolo medici
2025-10-22