Model Parameter Evaluation

Neglecting the presence of outliers in the input data on which regression is performed, two important open questions remain: one is to assess the quality of the obtained model, and at the same time, to provide an index of how far this estimate may be from the true model, due to errors in the input data.

This section extensively addresses the non-linear case: the linear case is equivalent by using the parameter matrix $\mathbf{X}$ instead of the Jacobian $\mathbf{J}$, which has already been partially discussed in section 2.7.

Let $\mathbf{y}=\left(y_1, \ldots, y_n \right)^{\top}$ be a vector of realizations of statistically independent random variables $y \in \mathbb{R}$ and $\boldsymbol\beta \in \mathbb{R}^m$ model parameters. An intuitive estimator of the goodness of fit of the model is the root-mean-squared residual error (RMSE), also referred to as the standard error of the regression:

\begin{displaymath}
s = \sqrt{ \frac{ \sum^{n}_{i=1} \left( y_{i} - \hat{y}_{i} \right)^{2} } {n} }
\end{displaymath} (3.64)

with $\hat{y}_i = f(\mathbf{x}_i, \hat{\boldsymbol\beta} )$ estimated value thanks to the model $f$ from which the parameters $\hat{\boldsymbol\beta}$ and $S = \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^{2}$ have been derived. Typically, this function has already been seen expressed in terms of the residual $r_i = \mathbf{y}_i - \hat{\mathbf{y}}_i$. If the estimator is not affected by bias (as occurs, for example, in least squares regression) $\E [ r_i ] = 0$. Therefore, in the case where the noise on the observations is Gaussian with zero mean, the value of $s \geq \sigma$ and the two values are equal when the model is optimal.

However, this is not a direct indicator of the quality of the identified solution, but rather how well the found model matches the input data: consider, for example, the limiting case of underdetermined systems where the residual will always be zero, regardless of the amount of noise affecting the individual observations.

The most suitable index for estimating the model is the variance-covariance matrix of the parameters (Parameter Variances and Covariances matrix).

The forward propagation of covariance has already been demonstrated in section 2.6, and with a quick reference, there are three methods to perform this operation: The first is based on the linear approximation of the model and involves the use of the Jacobian, the second is based on the more generic technique of Monte Carlo simulation, and finally, a modern alternative that averages between the first two is the Unscented Transformation (section 2.12.5), which empirically allows for estimates up to the third order in the case of Gaussian noise.

The desire to assess the quality of the identified parameters $\hat{\boldsymbol\beta}$ given the estimated noise covariance (Covariance Matrix Estimation) is precisely the opposite case, as it requires calculating the backward propagation of the variance (backward propagation). In fact, once this covariance matrix is obtained, it is possible to define a confidence interval around $\hat{\boldsymbol\beta}$.

The goodness of fit of the parameter estimates $\hat{\boldsymbol\beta}$, in the nonlinear case, can be assessed in a first approximation by inverting the linearized version of the model (although in this case, techniques such as Monte Carlo or Unscented Transform can also be employed for more rigorous estimates).

It is possible to identify the covariance matrix associated with the proposed solution $\hat{\boldsymbol\beta}$ in the case where the function $f$ is one-to-one and differentiable in the vicinity of that solution. Let $f : \mathbb{R}^m \to \mathbb{R}^n$ be a multivariate multidimensional function. It is possible to estimate the mean value $\bar{\mathbf{r}} = \E \left[\mathbf{y} - f(\hat{\boldsymbol\beta}) \right] \approx \mathbf{0}$ and the cross-covariance matrix $\boldsymbol\Sigma_r$ of the residuals; thus, the inverse transformation $f^{-1}$ will have a mean value $\hat{\boldsymbol\beta}$ and a covariance matrix

\begin{displaymath}
\Sigma_{\boldsymbol\beta} = (\mathbf{J}^{\top} \Sigma_r^{-1} \mathbf{J})^{-1}
\end{displaymath} (3.65)

with $\mathbf{J}$ being the Jacobian of the model $f$ evaluated at the point $\hat{\boldsymbol\beta}$:
\begin{displaymath}
J_{i,j} = \frac{\partial r_i}{\partial \beta_j } (\hat{\bold...
...frac{\partial f_i}{\partial \beta_j } (\hat{\boldsymbol\beta})
\end{displaymath} (3.66)

The equation (3.65) is derived by manipulating equation (2.30), which calculates the forward propagation of uncertainty.

Note that this (the inverse of the information matrix) is the lower bound of the Cramer-Rao inequality on the covariance that a consistent estimator of the parameter $\boldsymbol\beta$ can achieve.

In cases where the transformation $f$ is underdetermined, the rank of the Jacobian $d$, with $d<m$, is referred to as the number of essential parameters. In the event of an underdetermined transformation $f$, the formula (3.65) is not invertible; however, it can be shown that the best approximation of the covariance matrix can be obtained using the pseudo-inverse:


\begin{displaymath}
\Sigma_{\boldsymbol\beta} = (\mathbf{J}^{\top} \Sigma_r^{-1} \mathbf{J})^{+}
\end{displaymath}

Alternatively, it is possible to perform a QR decomposition with pivoting of the Jacobian, identify the linearly dependent columns (through the analysis of the diagonal of the matrix R), and remove them during the inversion of the matrix itself.

In the very common case where $f$ is a scalar function and the observation noise is independent with constant variance, the asymptotically estimated covariance matrix (Asymptotic Covariance Matrix) can be expressed more simply as

\begin{displaymath}
\Sigma_{\boldsymbol\beta} = ( \mathbf{J}^{\top}\mathbf{J})^{-1} \sigma^{2}
\end{displaymath} (3.67)

with $\sigma^2$ representing the variance of the observation noise, having applied the assumption $\boldsymbol \Sigma_r = \sigma^2 \mathbf{I}$ which holds in the case of independent realizations. Since $\mathbf{J}$ is a function solely of the geometry of the problem, the matrix $( \mathbf{J}^{\top}\mathbf{J})^{-1}$ is also a function only of the problem and not of the observations. Asymptotically, the estimate tends to $\boldsymbol\beta = \mathcal{N} \left( \hat{\boldsymbol\beta}, \Sigma_{\boldsymbol\beta} \right)$. The Jacobian matrix, as it indicates how the outputs are sensitive to the parameters, is also referred to as the sensitivity matrix.

The estimation of observation noise can be empirical, assuming the law of large numbers $\sigma = s$, calculated through

\begin{displaymath}
\sigma^{2} \approx \frac{\sum_{i=1}^{n} r_i^2}{n-m}
\end{displaymath} (3.68)

using the posterior statistics of the error on the data $r_i$. The denominator $n-m$ represents the statistical degrees of freedom of the problem: in this way, the estimated variance is infinite when the number of unknowns in the model equals the number of collected data points.

The Eicker-White covariance estimator is slightly different and its study is left to the reader.

The variance-covariance matrix of the parameters represents the error ellipsoid.

A useful metric for evaluating the problem is the D-optimal configuration (D-optimal design):

\begin{displaymath}
\det \left( \mathbf{J}^{\top}\mathbf{J} \right)^{-1}
\end{displaymath} (3.69)

which minimizes the determinant of the variance-covariance matrix, or conversely, maximizes the Fisher information matrix:
\begin{displaymath}
\det \mathbf{F}\left( \boldsymbol\beta \right)
\end{displaymath} (3.70)

Geometrically, this approach minimizes the volume of the error ellipsoid.

Other metrics include the E-optimal design, which consists of maximizing the smallest eigenvalue of the Fisher matrix, or equivalently, minimizing the largest eigenvalue of the variance-covariance matrix. Geometrically, this minimizes the maximum diameter of the ellipsoid.

Paolo medici
2025-10-22