The Soft Margin constraint (4.26) can be rewritten as
 |
(4.37) |
where
can also be a generic kernel function. This inequality is equivalent to
 |
(4.38) |
Since
.
The loss function (4.38) is referred to as the hinge loss function (Hinge Loss)
 |
(4.39) |
and has the advantage of being convex and non-differentiable only at 1. The hinge loss is always greater than the 0/1 loss function.
The training problem of SVM in the case of non-linearly separable data is equivalent to an unconstrained optimization problem on
of the form
 |
(4.40) |
The objective function continues to be described in two clearly distinct parts: the first is Tikhonov regularization, and the second is the empirical risk minimization with the hinge loss function. SVM can therefore be viewed as a linear classifier that optimizes the hinge loss function with L2 regularization.
The input data
can fall into 3 different categories:
-
are the points outside the margin and do not contribute to the cost function;
-
are the points on the margin and do not contribute to the cost as in the case of a "hard margin";
-
are the points that violate the constraint and contribute to the cost.
Paolo medici
2025-10-22