Levenberg-Marquardt

In the previous sections, the algorithms for solving nonlinear systems have been divided into gradient descent algorithms and Gauss-Newton algorithms. For a more in-depth reading, I recommend (MBT04).

In Gauss-Newton methods, when $\mathbf{J}^{\top}\mathbf{J}$ is positive definite, the method can always indicate a direction where the cost decreases. When $\mathbf{J}^{\top}\mathbf{J}$ becomes singular, the Gauss-Newton method becomes numerically unstable.

The technique proposed by Levenberg-Marquardt aims to leverage the strengths of both Gauss-Newton and gradient descent methods in order to benefit from both.

The Levenberg-Marquardt (LM) algorithm is an iterative regression technique that is now considered standard for solving multivariable nonlinear problems. An excellent description of the algorithm can be found in (Lou05). The algorithm can be viewed as consisting of a slow yet convergent gradient descent phase, followed by a faster Gauss-Newton type solver.

The Levenberg-Marquardt algorithm solves a slightly different version of the equation (3.43), a special case of the equation (3.34), known as the augmented normal equations:

\begin{displaymath}
\mathbf{N} \delta_\beta = -\mathbf{J}_{r}^{\top} \mathbf{r}
\end{displaymath} (3.47)

where $\mathbf{N} = \mathbf{H}_{S} + \mu \mathbf{I}$ with $\mu>0$ a damping factor: when the factor $\mu$ is high, the matrix $\mathbf{N}$ is nearly diagonal, and the algorithm approaches a steepest descent gradient method, while when the term $\mu$ is close to zero, the algorithm approximates the Newton method. Instead of a line search, Levenberg-Marquardt is a technique that implements the concept of a trust region.

In the Levenberg-Marquardt method, the approximation of the Hessian $\mathbf{H}_S(\boldsymbol\beta) \approx \mathbf{J}_{r}^{\top}\mathbf{J}_{r}$ is also typically utilized, as seen in the Gauss-Newton method, but is limited to the case where the loss function is quadratic.

How to set and modify between iterations $\mu$ is, however, a problem left to the solver, and various techniques are proposed in the literature.

One of the most widely used implementations (Nie99) selects as $\mu_0$ a value of the type

\begin{displaymath}
\mu_0 = \tau \max \trace \mathbf{H}
\end{displaymath} (3.48)

with $\tau$ chosen freely by the user based on their confidence in the value of $\boldsymbol\beta$. The modification of $\mu$ across the various iterations is managed by the gain factor $\rho$ (gain ratio):
\begin{displaymath}
\rho = \frac{ S(\boldsymbol\beta) - S(\boldsymbol\beta + \bo...
...\mu \boldsymbol\delta_\beta + \mathbf{J}^{\top} \mathbf{r} ) }
\end{displaymath} (3.49)

A high value of $\rho$ indicates that the linearized version of $f$ is very good, and $\mu$ can be decreased. Conversely, if $\rho$ is high, then the value of $\mu$ should be increased. Finally, when $\rho \approx 1$ is present, there is a good correspondence between the predicted model and the data. In the limiting case, when $\rho$ is negative, it indicates a deteriorating solution that should be discarded, and $\mu$ should be increased in order to approach a gradient descent method. Through $\rho$, it is possible to modify $\mu$ according to the rule
\begin{algorithmic}
\If {$\rho > 0$}
\State $\mu \gets \mu \max \left(\frac{1}{...
...e
\State $\mu \gets \mu \nu$
\State $\nu \gets 2 \nu$
\EndIf
\end{algorithmic}

Paolo medici
2025-10-22