L1 and L2 Regularization

L1 and L2 regularization involves adding an additional term to the cost function that penalizes certain configurations. Regularizing, for example, the cost function

$\begin{displaymath} S(\boldsymbol\beta, \mathbf{X}) = - \sum_i \log P (Y = y_i \vert \mathbf{x}_i ; \boldsymbol\beta) \end{displaymath}$

(4.96)

means adding a term, which is a function solely of $\boldsymbol\beta$ , in order to obtain the new cost function of the form

$\begin{displaymath} E(\boldsymbol\beta, \mathbf{X}) = S(\boldsymbol\beta, \mathbf{X}) + \lambda R(\boldsymbol\beta) \end{displaymath}$

(4.97)

with $R(\boldsymbol\beta)$ being a regularizing function.

A widely used regularization function is

$\begin{displaymath} R(\boldsymbol\beta) = \left( \sum_j \vert \beta_j \vert ^ p \right)^{1/p} \end{displaymath}$

(4.98)

Common values for $p$

are

(hence it is referred to as L1 or L2 regularization). When $p=2$

can also be defined in the literature as weight decay. This type of regularization function penalizes parameters with excessively high values.

Paolo medici
2025-10-22