Newton-Raphson Method

The problem of finding the minima of a function can be reduced to the problem of finding the zeros of a function, specifically the first derivative of the cost function $S$.

Let $\mathbf{g}: \mathbb{R}^{m} \mapsto \mathbb{R}^{n}$ be a differentiable multivariable function for which we need to find

\begin{displaymath}
\mathbf{g}(\mathbf{x})=\mathbf{0}
\end{displaymath} (3.28)

. By expanding the function $\mathbf{g}$ in a Taylor series locally around an appropriate point $\mathbf {x}$, we obtain
\begin{displaymath}
\mathbf{g}(\mathbf{x} + \boldsymbol\delta) = \mathbf{g}(\ma...
...}) + \mathbf{J}_{g} \boldsymbol\delta + O(\boldsymbol\delta^2)
\end{displaymath} (3.29)

where $\mathbf{J}_{g}$ is the matrix $n \times m$ Jacobian of the function $\mathbf{g}$ evaluated at $\mathbf {x}$.

The objective is to modify the value of $\mathbf {x}$ by an amount $\boldsymbol\delta$ such that the cost function calculated at $\mathbf{x}_t + \boldsymbol\delta$ is exactly zero. Ignoring contributions of higher order than $\boldsymbol\delta^{2}$, the estimate of $\boldsymbol\delta$ that, in first approximation, brings the function $\mathbf{g}$ close to zero is the solution of the linear system (3.29) with the condition (3.28), namely

\begin{displaymath}
\mathbf{J}_{g} \boldsymbol\delta = - \mathbf{g}(\mathbf{x})
\end{displaymath} (3.30)

. This system, if $\mathbf{J}_{g}$ has no rank deficiencies, is a simple linear system, which may also be overdetermined, and can be solved using one of the techniques shown in section [*]. The idea behind iterative methods is to modify the point $\mathbf{x}_t$ of the quantity $\boldsymbol\delta_t$
\begin{displaymath}
\mathbf{x}_{t+1} = \mathbf{x}_t + \boldsymbol\delta_t
\end{displaymath} (3.31)

for the iterations $t=1,2,\ldots$, calculated in such a way as to progressively approach zero for the function.

In the case of a single variable $n=m=1$, the Newton's method reduces to

\begin{displaymath}
x_{t+1} = x_t - \dfrac{g(x)}{g'(x)}
\end{displaymath} (3.32)

In numerical analysis, this is known as the Newton method (or Newton-Raphson method) for finding the zeros of a function.

The points of maximum and minimum of a function are the points at which the gradient can be set to zero. Therefore, this technique can be applied to find the maxima and minima of a function $f(\mathbf{x}): \mathbb{R}^{m} \mapsto \mathbb{R}$ by defining

\begin{displaymath}
\begin{array}{l}
\mathbf{g}(\mathbf{x}) = \nabla f(\mathbf...
...bf{J}_g(\mathbf{x}) = \mathbf{H}_f(\mathbf{x}) \\
\end{array}\end{displaymath} (3.33)

where $\nabla f(\mathbf{x})$ is the gradient function $\mathbb{R}^{m} \mapsto \mathbb{R}^{m}$ and $\mathbf{H}_f(\mathbf{x})$ is the Hessian matrix $m \times m$, with the gradient and Hessian functions of $f$ evaluated at $\mathbf {x}$. The modification of the Newton point $\mathbf {x}$ thus becomes
\begin{displaymath}
\mathbf{H}_f(\mathbf{x}) \boldsymbol\delta_t = - \nabla f(\mathbf{x})
\end{displaymath} (3.34)

. When used for optimization, the Newton method effectively approximates the function $f(\mathbf{x})$ in the vicinity of $\mathbf {x}$ with a quadratic. If $f(\mathbf{x})$ is a quadratic function, convergence is guaranteed in a single iteration.

Now, in the specific case of optimization methods, the function $f(\mathbf{x})$ is the cost function $S(\boldsymbol\beta)$. Therefore, when the Hessian matrix of $S(\boldsymbol\beta)$ is non-singular, the parameter variation equation is obtained as follows:

\begin{displaymath}
\boldsymbol\delta_t = -\mathbf{H}_{S}^{-1}(\boldsymbol\beta_t) \nabla S(\boldsymbol\beta_t)
\end{displaymath} (3.35)

through the Newton optimization method.

Paolo medici
2025-10-22