Analyzing real systems, it is easy to encounter the problem of deriving the "solution" of an overdetermined linear system.
The importance of this topic is evident: when observations are made on a real system, it is naturally affected by observation noise. This noise clearly compromises the result of the single observation, but fortunately, it is usually possible to acquire many more observations than unknowns, thus obtaining an overdetermined system.
Under these conditions, to obtain a solution to the problem that minimizes the error, the use of a numerical regression technique is required, for example, least squares. In this first section, widely used mathematical techniques throughout the book are presented: for further details regarding these techniques, one can refer to Chapter 3 which is entirely focused on this subject.
We therefore have an overdetermined linear system
We define1.1 the error metric as the norm of the residual
If the equation (1.1) is multiplied by
, a "traditional" linear system is obtained, which has the solution
However, the solution proposed in equation (1.3) is numerically unstable since
. Further details on the conditioning of matrices and the propagation of disturbances in the solution of well-posed or over-determined linear systems will be presented in section 2.7.
If the system is well-conditioned, the most stable technique for solving a problem involving the normal equations is Cholesky factorization.
It can be shown that a solution , better conditioned and that minimizes the function (1.2), exists and is given by:
By construction, is a solution to the system (1.1) and is also the vector that minimizes the function (1.2). The pseudoinverse matrix of
is denoted as
and is given by
This solution of the system is called the Moore-Penrose pseudoinverse.
The pseudoinverse has the following properties:
It is important to clarify from the outset that in minimizing the quantity (1.2), no assumptions have been made regarding the distribution of noise within the various components of which the matrix is composed: without such information, there is no guarantee that the solution will be optimal from a statistical point of view. Without assumptions about the noise distribution, the solution obtained through this minimization is indeed a purely algebraic solution that minimizes an algebraic error.
It is possible to obtain a slightly better solution from a statistical perspective when the noise is zero-mean white Gaussian and the variance of the noise for each observation is known. In this case, it is possible to assign different weights to each equation of the system by multiplying each row of the system by an appropriate weight, thus weighting each acquired data point differently.
A more in-depth discussion on this topic can be found in section 3.2, and in general, in chapter 2, the general case will be addressed where the way in which the error in the observed data affects the estimation of the parameters is known.
Stable techniques exist instead, based on factorizations that allow for deriving the solution directly from the matrix .
Using, for example, QR factorization, a method that is well-known for its numerical stability, the original problem (1.1) transforms into the problem
, and the solution can be derived from
, leveraging the orthogonality of matrix
. In QR factorization, the relationship
holds, meaning that
is the Cholesky factorization of
: through this relationship, the pseudoinverse can ultimately be obtained explicitly.
On the other hand, using the Singular Value Decomposition Singular Value Decomposition (SVD), the oversized matrix is decomposed into three matrices with interesting properties. Let
be the singular value decomposition (SVD) of
.
is a unitary matrix of dimensions
(depending on the formalism used, complete SVD or economic SVD, the dimensions of the matrices may vary, and
may become
),
is a diagonal matrix that contains the singular values (the eigenvalues of matrix
, with dimensions, depending on the formalism,
or
), and
is an orthonormal matrix, conjugate transposed, of dimensions
.
Through a purely mathematical procedure, it is obtained that the pseudoinverse of is equivalent to
In summary, the ways to solve an oversized linear system are
Further details on the Moore-Penrose pseudoinverse can be found in many books, for example in (CM09) or in the foundational text on numerical analysis (GVL96).
Let us now examine the case in which the linear system to be solved is homogeneous.
A homogeneous linear system has the form
| (1.8) |
In this case, the SVD proves to be an extremely effective and computationally stable technique: the bases of the kernel of are indeed exactly the columns of
associated with the zero singular values (eigenvalues) of the diagonal matrix
. Generally, due to the presence of noise, there will not be an exactly zero singular value, but the column associated with the minimum singular value must be chosen.
The eigenvectors associated with null singular values of the matrix thus represent the kernel of the matrix itself, and the number of null eigenvalues represents the dimension of the kernel. It should be noted that in equation (1.6), the presence of zeros in the diagonal matrix
was problematic: it is now understood that this presence is indicative of the fact that one of the components of the problem is completely uncorrelated with the solution and, as such, could be neglected. This result will indeed be used in section 2.10.1 in the discussion of the PCA algorithm.
The solution of the subspace of is therefore
| (1.9) |
The SVD decomposition is one of the most stable and versatile techniques developed in recent years for solving linear systems, and throughout this book, this technology will be widely used.