Gauss-Newton

The methods discussed so far allow considerable freedom in the choice of a particular loss function over another. In practical cases where the cost function $\ell$ is quadratic, further optimizations can be made to the Newton method, avoiding the cumbersome computation of the Hessian. In this case, the loss function takes the form already seen previously,

$\begin{displaymath} S(\boldsymbol\beta) = \frac{1}{2} \mathbf{r}^{\top} \mathbf{r} = \frac{1}{2} \sum_{i=1}^{n} r_i^2 (\boldsymbol\beta) \end{displaymath}$

(3.40)

. The term $1/2$

in the cost function serves to provide a more compact expression of the Jacobian.

With this cost function, the gradient and Hessian are expressed as

$\begin{displaymath} \begin{array}{l} \nabla S(\boldsymbol\beta) = \sum_{i=1}^{... ...thbf{J}_{r} + \sum_{i=1}^{n} r_i \mathbf{H}_{r_i} \end{array}\end{displaymath}$

(3.41)

When the parameters are close to the exact solution, the residual is small, and the Hessian can be approximated by only the first term of the expression, namely

$\begin{displaymath} \mathbf{H}_S(\boldsymbol\beta) \approx \mathbf{J}_{r}^{\top}\mathbf{J}_{r} \end{displaymath}$

(3.42)

. Under these conditions, the gradient and the Hessian of the cost function $S$

can be expressed solely in terms of the Jacobian of the functions $r_i(\boldsymbol\beta)$ . The approximated expression for the Hessian can be incorporated into equation (3.34):

$\begin{displaymath} - \mathbf{J}_{r}^{\top} \mathbf{r} = \mathbf{H}_{S} \boldsy... ...ox \mathbf{J}_{r}^{\top}\mathbf{J}_{r} \boldsymbol\delta_\beta \end{displaymath}$

(3.43)

. This, similar to the case of Newton, is a linear minimization problem that can be solved using the normal equations:

$\begin{displaymath} \boldsymbol\delta_\beta = - \left( \mathbf{J}_{r}^{\top}\mathbf{J}_{r} \right)^{-1} \mathbf{J}_{r}^{\top} \mathbf{r} \end{displaymath}$

(3.44)

. The significance of the normal equations is geometric: the minimum is achieved when $\mathbf{J}\boldsymbol\delta_\beta - \mathbf{r}$ becomes orthogonal to the column space of $\mathbf{J}$ .

In the particular case of the residue function written as

$\begin{displaymath} r_i = y_i - f_i(\mathbf{x}_i ; \boldsymbol\beta) \end{displaymath}$

(3.45)

, that is similar to those in equation (3.6), it is possible to use $\mathbf{J}_f$ , the Jacobian of $f$

, instead of $\mathbf{J}_r$ .

$\begin{displaymath} \boldsymbol\delta_\beta = \left( \mathbf{J}_{f}^{\top}\mathbf{J}_{f} \right)^{-1} \mathbf{J}_{f}^{\top} \mathbf{r} \end{displaymath}$

(3.46)

Having observed that the derivatives of $r_i$

and $f_i(\mathbf{x}_i)$ are equal up to a sign^3.2.

Footnotes

... sign ^3.2: Clearly, the derivatives coincide when a residue of the type $r_i = \hat{y}_i - y_i$ is chosen.

Paolo medici
2025-10-22