The gradient descent algorithm (gradient descent GD or steepest descent) updates the weights
at each iteration using the gradient (specifically, the negative gradient) of
:
By introducing a user-defined parameter into the expression (3.36), this approach is empirical and problem-dependent, if not reliant on the experience of the human user. Following this observation and comparing the gradient descent equation with equation (3.35), it becomes evident that Newton's method is, in fact, a special case of gradient descent. Therefore, one can achieve better optimization by replacing the scalar parameter
with the positive definite matrix
obtained from the inverse of the Hessian at the point:
| (3.38) |
Paolo medici