Newton-Raphson Method

The problem of finding the minima of a function can be reduced to the problem of finding the zeros of a function, specifically the first derivative of the cost function $S$ .

Let $\mathbf{g}: \mathbb{R}^{m} \mapsto \mathbb{R}^{n}$ be a differentiable multivariable function for which we need to find

$\begin{displaymath} \mathbf{g}(\mathbf{x})=\mathbf{0} \end{displaymath}$

(3.28)

. By expanding the function $\mathbf{g}$ in a Taylor series locally around an appropriate point $\mathbf {x}$ , we obtain

$\begin{displaymath} \mathbf{g}(\mathbf{x} + \boldsymbol\delta) = \mathbf{g}(\ma... ...}) + \mathbf{J}_{g} \boldsymbol\delta + O(\boldsymbol\delta^2) \end{displaymath}$

(3.29)

where $\mathbf{J}_{g}$ is the matrix $n \times m$ Jacobian of the function $\mathbf{g}$ evaluated at $\mathbf {x}$ .

The objective is to modify the value of $\mathbf {x}$ by an amount $\boldsymbol\delta$ such that the cost function calculated at $\mathbf{x}_t + \boldsymbol\delta$ is exactly zero. Ignoring contributions of higher order than $\boldsymbol\delta^{2}$ , the estimate of $\boldsymbol\delta$ that, in first approximation, brings the function $\mathbf{g}$ close to zero is the solution of the linear system (3.29) with the condition (3.28), namely

$\begin{displaymath} \mathbf{J}_{g} \boldsymbol\delta = - \mathbf{g}(\mathbf{x}) \end{displaymath}$

(3.30)

. This system, if $\mathbf{J}_{g}$ has no rank deficiencies, is a simple linear system, which may also be overdetermined, and can be solved using one of the techniques shown in section

. The idea behind iterative methods is to modify the point $\mathbf{x}_t$ of the quantity $\boldsymbol\delta_t$

$\begin{displaymath} \mathbf{x}_{t+1} = \mathbf{x}_t + \boldsymbol\delta_t \end{displaymath}$

(3.31)

for the iterations $t=1,2,\ldots$ , calculated in such a way as to progressively approach zero for the function.

In the case of a single variable $n=m=1$ , the Newton's method reduces to

$\begin{displaymath} x_{t+1} = x_t - \dfrac{g(x)}{g'(x)} \end{displaymath}$

(3.32)

In numerical analysis, this is known as the Newton method (or Newton-Raphson method) for finding the zeros of a function.

The points of maximum and minimum of a function are the points at which the gradient can be set to zero. Therefore, this technique can be applied to find the maxima and minima of a function $f(\mathbf{x}): \mathbb{R}^{m} \mapsto \mathbb{R}$ by defining

$\begin{displaymath} \begin{array}{l} \mathbf{g}(\mathbf{x}) = \nabla f(\mathbf... ...bf{J}_g(\mathbf{x}) = \mathbf{H}_f(\mathbf{x}) \\ \end{array}\end{displaymath}$

(3.33)

where $\nabla f(\mathbf{x})$ is the gradient function $\mathbb{R}^{m} \mapsto \mathbb{R}^{m}$ and $\mathbf{H}_f(\mathbf{x})$ is the Hessian matrix $m \times m$ , with the gradient and Hessian functions of $f$

evaluated at $\mathbf {x}$ . The modification of the Newton point $\mathbf {x}$ thus becomes

$\begin{displaymath} \mathbf{H}_f(\mathbf{x}) \boldsymbol\delta_t = - \nabla f(\mathbf{x}) \end{displaymath}$

(3.34)

. When used for optimization, the Newton method effectively approximates the function $f(\mathbf{x})$ in the vicinity of $\mathbf {x}$ with a quadratic. If $f(\mathbf{x})$ is a quadratic function, convergence is guaranteed in a single iteration.

Now, in the specific case of optimization methods, the function $f(\mathbf{x})$ is the cost function $S(\boldsymbol\beta)$ . Therefore, when the Hessian matrix of $S(\boldsymbol\beta)$ is non-singular, the parameter variation equation is obtained as follows:

$\begin{displaymath} \boldsymbol\delta_t = -\mathbf{H}_{S}^{-1}(\boldsymbol\beta_t) \nabla S(\boldsymbol\beta_t) \end{displaymath}$

(3.35)

through the Newton optimization method.

Paolo medici
2025-10-22