Neglecting the presence of outliers in the input data on which regression is performed, two important open questions remain: one is to assess the quality of the obtained model, and at the same time, to provide an index of how far this estimate may be from the true model, due to errors in the input data.
This section extensively addresses the non-linear case: the linear case is equivalent by using the parameter matrix instead of the Jacobian
, which has already been partially discussed in section 2.7.
Let
be a vector of realizations of statistically independent random variables
and
model parameters. An intuitive estimator of the goodness of fit of the model is the root-mean-squared residual error (RMSE), also referred to as the standard error of the regression:
| (3.64) |
However, this is not a direct indicator of the quality of the identified solution, but rather how well the found model matches the input data: consider, for example, the limiting case of underdetermined systems where the residual will always be zero, regardless of the amount of noise affecting the individual observations.
The most suitable index for estimating the model is the variance-covariance matrix of the parameters (Parameter Variances and Covariances matrix).
The forward propagation of covariance has already been demonstrated in section 2.6, and with a quick reference, there are three methods to perform this operation: The first is based on the linear approximation of the model and involves the use of the Jacobian, the second is based on the more generic technique of Monte Carlo simulation, and finally, a modern alternative that averages between the first two is the Unscented Transformation (section 2.12.5), which empirically allows for estimates up to the third order in the case of Gaussian noise.
The desire to assess the quality of the identified parameters
given the estimated noise covariance (Covariance Matrix Estimation) is precisely the opposite case, as it requires calculating the backward propagation of the variance (backward propagation). In fact, once this covariance matrix is obtained, it is possible to define a confidence interval around
.
The goodness of fit of the parameter estimates
, in the nonlinear case, can be assessed in a first approximation by inverting the linearized version of the model (although in this case, techniques such as Monte Carlo or Unscented Transform can also be employed for more rigorous estimates).
It is possible to identify the covariance matrix associated with the proposed solution
in the case where the function
is one-to-one and differentiable in the vicinity of that solution. Let
be a multivariate multidimensional function. It is possible to estimate the mean value
and the cross-covariance matrix
of the residuals; thus, the inverse transformation
will have a mean value
and a covariance matrix
| (3.66) |
Note that this (the inverse of the information matrix) is the lower bound of the Cramer-Rao inequality on the covariance that a consistent estimator of the parameter
can achieve.
In cases where the transformation is underdetermined, the rank of the Jacobian
, with
, is referred to as the number of essential parameters. In the event of an underdetermined transformation
, the formula (3.65) is not invertible; however, it can be shown that the best approximation of the covariance matrix can be obtained using the pseudo-inverse:
Alternatively, it is possible to perform a QR decomposition with pivoting of the Jacobian, identify the linearly dependent columns (through the analysis of the diagonal of the matrix R), and remove them during the inversion of the matrix itself.
In the very common case where is a scalar function and the observation noise is independent with constant variance, the asymptotically estimated covariance matrix (Asymptotic Covariance Matrix) can be expressed more simply as
| (3.67) |
The estimation of observation noise can be empirical, assuming the law of large numbers , calculated through
| (3.68) |
The Eicker-White covariance estimator is slightly different and its study is left to the reader.
The variance-covariance matrix of the parameters represents the error ellipsoid.
A useful metric for evaluating the problem is the D-optimal configuration (D-optimal design):
| (3.69) |
| (3.70) |
Other metrics include the E-optimal design, which consists of maximizing the smallest eigenvalue of the Fisher matrix, or equivalently, minimizing the largest eigenvalue of the variance-covariance matrix. Geometrically, this minimizes the maximum diameter of the ellipsoid.
Paolo medici