L1 and L2 Regularization

L1 and L2 regularization involves adding an additional term to the cost function that penalizes certain configurations. Regularizing, for example, the cost function

\begin{displaymath}
S(\boldsymbol\beta, \mathbf{X}) = - \sum_i \log P (Y = y_i \vert \mathbf{x}_i ; \boldsymbol\beta)
\end{displaymath} (4.96)

means adding a term, which is a function solely of $\boldsymbol\beta$, in order to obtain the new cost function of the form
\begin{displaymath}
E(\boldsymbol\beta, \mathbf{X}) = S(\boldsymbol\beta, \mathbf{X}) + \lambda R(\boldsymbol\beta)
\end{displaymath} (4.97)

with $R(\boldsymbol\beta)$ being a regularizing function.

A widely used regularization function is

\begin{displaymath}
R(\boldsymbol\beta) = \left( \sum_j \vert \beta_j \vert ^ p \right)^{1/p}
\end{displaymath} (4.98)

Common values for $p$ are $1$ or $2$ (hence it is referred to as L1 or L2 regularization). When $p=2$ can also be defined in the literature as weight decay. This type of regularization function penalizes parameters with excessively high values.



Paolo medici
2025-10-22