Empirical Risk Minimization

The Soft Margin constraint (4.26) can be rewritten as
\begin{displaymath}
y_i f(\mathbf{x}_i) \ge 1 - \xi_i
\end{displaymath} (4.37)

where $f(\mathbf{x}_i)$ can also be a generic kernel function. This inequality is equivalent to
\begin{displaymath}
\xi_i \geq \max \left(0, 1 - y_i f(\mathbf{x}_i) \right)
\end{displaymath} (4.38)

Since $\xi_i \ge 0$. The loss function (4.38) is referred to as the hinge loss function (Hinge Loss)
\begin{displaymath}
\ell(y, \hat{y}) = \max(0, 1 - y \hat{y} )
\end{displaymath} (4.39)

and has the advantage of being convex and non-differentiable only at 1. The hinge loss is always greater than the 0/1 loss function.

The training problem of SVM in the case of non-linearly separable data is equivalent to an unconstrained optimization problem on $\mathbf{w}$ of the form

\begin{displaymath}
\min_{\mathbf{w} \in \mathbb{R}^{d}} \Vert \mathbf{w} \Vert^2 + C \sum_{i=1}^{N} \ell \left( y_i, f(\mathbf{x}_i) \right)
\end{displaymath} (4.40)

The objective function continues to be described in two clearly distinct parts: the first is Tikhonov regularization, and the second is the empirical risk minimization with the hinge loss function. SVM can therefore be viewed as a linear classifier that optimizes the hinge loss function with L2 regularization.

The input data $\mathbf{x}_i$ can fall into 3 different categories:

Paolo medici
2025-10-22