![]() |
When dealing with practical problems, it becomes necessary to transition from a reference system fixed to the camera, where point
coincides with the focal point (pin-hole), to a more generic reference system that better suits the user's needs. In this system, the camera is positioned at an arbitrary point in the "world" and oriented with respect to it in an arbitrary manner. This discussion applies to any generic sensor, including non-video sensors, by defining relationships that allow for the conversion of points from world coordinates to sensor coordinates and vice versa.
At this point, it is necessary to clarify the terminology related to reference systems in this book: the reference system termed "world" is defined as the system that is considered absolute and fixed at any given time, with respect to which the sensor is positioned. In Figure 8.4, for example, the origin of the "world" system is associated with a point on the vehicle (such as the front point). In this case, the "vehicle" (body) and "world" (world) systems are synonymous.
However, this distinction becomes less clear when there is a moving vehicle with respect to a "world" that can again be defined as the fixed reference system. In this case, we will have the sensor coordinates, the local coordinates of the vehicle/body, and finally those of the world. Typically, however, the coordinate system that distinguishes the sensor, vehicle, and world is kept consistent.
In camera coordinates, the special role that the coordinate assumes is due to purely mathematical reasons, specifically the use of homogeneous coordinates, which during projection necessitates the division of the first two components by the third. In "sensor" coordinates, this limitation is no longer applicable.
Although not binding in any way, this book adopts the "sensor," "body," and "world" systems presented in Figure 8.4 (ISO 8855), which assigns the axis the height of the point above the ground.
Therefore, to arrive at the definitive equation of the pin-hole camera, we start from equation (8.4) and apply the following considerations:
The conversion from "world" coordinates to "camera" coordinates, being a composition of rotations, is also a rotation described by the equation
.
Let
be a point in "world" coordinates and
the same point in "camera" coordinates. The relationship that connects these two points can be expressed as
It should be noted that rotation matrices are orthonormal matrices: they have a determinant of 1, thus preserving distances and areas, and the inverse of a rotation matrix is its transpose.
The matrix and the vector
can be combined into a matrix form
by utilizing homogeneous coordinates. With this representation, it is possible to express the projection of a point, represented in world coordinates, homogeneous to
, into a point with image coordinates, homogeneous to
:
From this equation, it is quite explicit that at each point of the image , there are infinitely many points in the world
that lie on a line as the parameter
varies.
By implying and collecting the matrices, we obtain the final equation of the pin-hole camera (which does not account for, nor should it account for, distortion):
It is important to note that by imposing an additional constraint on the points, for example , the matrix
is reduced to a matrix
, which is invertible and is exactly the homographic matrix (see section 8.3.1) of the perspective transformation of the ground points. The matrix
is an example of an IPM (Inverse Perspective Mapping) transformation used to obtain a bird's eye view of the captured scene (MBLB91).
The inverse relationship of the equation (8.14), which transforms image points into world coordinates, can be expressed as:
By directly using the Camera Matrix
, it is possible to achieve a result equivalent to equation (8.16) in the form of