Gauss-Newton

The methods discussed so far allow considerable freedom in the choice of a particular loss function over another. In practical cases where the cost function $\ell$ is quadratic, further optimizations can be made to the Newton method, avoiding the cumbersome computation of the Hessian. In this case, the loss function takes the form already seen previously,
\begin{displaymath}
S(\boldsymbol\beta) = \frac{1}{2} \mathbf{r}^{\top} \mathbf{r} = \frac{1}{2} \sum_{i=1}^{n} r_i^2 (\boldsymbol\beta)
\end{displaymath} (3.40)

. The term $1/2$ in the cost function serves to provide a more compact expression of the Jacobian.

With this cost function, the gradient and Hessian are expressed as

\begin{displaymath}
\begin{array}{l}
\nabla S(\boldsymbol\beta) = \sum_{i=1}^{...
...thbf{J}_{r} + \sum_{i=1}^{n} r_i \mathbf{H}_{r_i}
\end{array}\end{displaymath} (3.41)

When the parameters are close to the exact solution, the residual is small, and the Hessian can be approximated by only the first term of the expression, namely

\begin{displaymath}
\mathbf{H}_S(\boldsymbol\beta) \approx \mathbf{J}_{r}^{\top}\mathbf{J}_{r}
\end{displaymath} (3.42)

. Under these conditions, the gradient and the Hessian of the cost function $S$ can be expressed solely in terms of the Jacobian of the functions $r_i(\boldsymbol\beta)$. The approximated expression for the Hessian can be incorporated into equation (3.34):
\begin{displaymath}
- \mathbf{J}_{r}^{\top} \mathbf{r} = \mathbf{H}_{S} \boldsy...
...ox \mathbf{J}_{r}^{\top}\mathbf{J}_{r} \boldsymbol\delta_\beta
\end{displaymath} (3.43)

. This, similar to the case of Newton, is a linear minimization problem that can be solved using the normal equations:
\begin{displaymath}
\boldsymbol\delta_\beta = - \left( \mathbf{J}_{r}^{\top}\mathbf{J}_{r} \right)^{-1} \mathbf{J}_{r}^{\top} \mathbf{r}
\end{displaymath} (3.44)

. The significance of the normal equations is geometric: the minimum is achieved when $\mathbf{J}\boldsymbol\delta_\beta - \mathbf{r}$ becomes orthogonal to the column space of $\mathbf{J}$.

In the particular case of the residue function written as

\begin{displaymath}
r_i = y_i - f_i(\mathbf{x}_i ; \boldsymbol\beta)
\end{displaymath} (3.45)

, that is similar to those in equation (3.6), it is possible to use $\mathbf{J}_f$, the Jacobian of $f$, instead of $\mathbf{J}_r$.
\begin{displaymath}
\boldsymbol\delta_\beta = \left( \mathbf{J}_{f}^{\top}\mathbf{J}_{f} \right)^{-1} \mathbf{J}_{f}^{\top} \mathbf{r}
\end{displaymath} (3.46)

Having observed that the derivatives of $r_i$ and $f_i(\mathbf{x}_i)$ are equal up to a sign3.2.



Footnotes

... sign3.2
Clearly, the derivatives coincide when a residue of the type $r_i = \hat{y}_i - y_i$ is chosen.
Paolo medici
2025-10-22