In 1981, Christopher Longuet-Higgins (Lon81) was the first to observe that a generic point expressed in world coordinates, the corresponding points in camera coordinates, and the pin-hole must be coplanar. The geometric derivation of the relationships among the points is omitted, but the analytical presentation is provided directly.
It has been repeatedly stated that a point in an image subtends a line in the world, and the line in the world projected onto another image, captured from a different viewpoint, represents the epipolar line where the corresponding point of the first image lies. This equation, which relates points in one image to lines in the other, can be expressed in a matrix form.
To follow Higgins' reasoning, the intrinsic parameter matrix will be implicit, and the coordinates used will be those of the normalized camera.
Without loss of generality, consider a system consisting of two cameras, the first positioned and oriented with respect to the second with projection matrix
, while the second is placed at the origin of the reference system aligned with the axes, that is, with projection matrix
: one can arrive at the same result starting from two generic calibrated cameras, arbitrarily oriented and positioned with respect to a third system, through the relations
and
, which represent the position of camera 1 with respect to system 2.
A generic point
has coordinates
and
in the two different reference systems and is projected onto sensors 1 and 2 at the points with camera coordinates
and
, respectively.
These image points are known to span a subspace of defined by the equation, for instance,
, which passes through the pin-hole of the second sensor (here set to be at
), namely
| (9.36) |
A generic point
expressed in coordinates of sensor 1 and observed by that sensor can be projected into coordinates of sensor 2 according to the equation
The equation of the epipolar line, a line in camera coordinates of the second sensor and the locus of points where must lie, associated with the point
(observed and therefore expressed in camera coordinates of the first sensor), is given by
However, there exists a relationship that connects the points of the two cameras by eliminating the parameters , but more importantly, it allows for the reverse reasoning, that is, to derive the relative pose between the two cameras
given a list of corresponding points.
If both sides of the equation (9.38) are first multiplied vectorially by , and then scalarly by
, one obtains
| (9.39) |
This passage has a physical significance: first of all, the coplanarity constraints (all expressed, for example, in reference 2) are introduced among the points (the pinhole of camera 2),
,
,
, and
(the pinhole of camera 1 in system 2), along with the fact that the body is rigid.
Through this formula, it is possible to express the relationships between the corresponding points and
, represented in the form of homogeneous camera coordinates, in a very compact form
Finally, denoting with
, the antisymmetric matrix, the vector product in matrix form (see section 1.7), it is possible to collect the various contributions in the form of a matrix
Finally, one must pay close attention to the indices because there is no unique convention for indicating points 1 and 2: assuming the convention in (9.41) is satisfied, what needs to be remembered is that matrix encodes the relative pose of the camera of the points on the right (in our case
) with respect to the camera of the points on the left (in our case
) of the matrix.
The matrix , relating homogeneous points, is also homogeneous and therefore defined up to a multiplicative factor.
The Essential matrix has the following properties:
The Essential matrix establishes relationships in camera coordinates and, therefore, to utilize it from a practical standpoint, it is necessary to have points expressed in this particular reference system. In other words, it is essential to know the intrinsic parameters of the cameras involved.
The equation
| (9.43) |
It is indeed possible to introduce an additional relationship between the points of the images, completely disregarding the intrinsic parameters of the cameras themselves.
If we apply the definition of homogeneous camera coordinates
in relation (9.42), we obtain
The Fundamental matrix is defined (Faugeras and Hartley, 1992) as:
If two points on the two images of the stereoscopic pair represent the same point in the world, the equation (9.45) must be satisfied.
The fundamental matrix allows us to narrow down the search range for correspondences between the two images because, due to the point-line duality, from the relationship (9.45), we can specify the location of points in the second image where we should search for points from the first image.
Indeed, the equation of a line where the points and
must lie is described by
| (9.46) |
The relationship between the Fundamental matrix and the Essential matrix is given by equation (9.44),
| (9.47) |
The Essential matrix collects in the relative poses between the cameras, while the Fundamental matrix conceals both the intrinsic parameters and the relative pose.
The Essential matrix introduces constraints that are equivalent to those of the Fundamental matrix; however, although it was historically introduced before the Fundamental matrix, it is a special case because it expresses the relationships with respect to camera coordinates.
is a matrix
of rank 2, and it can be determined with just 7 points, as the degrees of freedom amount to exactly 7 (a multiplicative factor and the zero determinant reduce the dimensionality of the problem). The relationship that connects the Fundamental matrix to the 7 degrees of freedom is a nonlinear relationship (one that is not easily expressible through any algebraic representation). With (at least) 8 points, however, it is possible to obtain a linear estimate of the matrix, as described in the following section.
The Fundamental matrix has the following properties:
|
The Fundamental and Essential matrices can be used to narrow down the search space for corresponding points between two images and/or filter out potential outliers (for example, in RANSAC). The Essential matrix, when decomposed, allows for the extraction of the relative pose between the two cameras and, as such, provides an approximate idea of the motion experienced by a camera moving through the world (motion stereo) or the relative pose of two cameras in a stereoscopic pair (Auto-Calibration).
The use of the Essential matrix allows for the derivation of the relative pose between two views. However, it is not possible to determine the length of the baseline connecting the two pin-holes, but only its direction. Nevertheless, with the Essential matrix at hand, it is always possible to perform a three-dimensional reconstruction of the observed scene up to a multiplicative factor: the ratios between distances are known, but not their absolute values.
This, however, enables a coherent three-dimensional reconstruction when observing the same scene from more than two different views, where the unknown multiplicative factor remains consistent across all views, thus allowing the merging of all individual reconstructions into a single reconstruction known up to the same scale factor.