Soft Margin SVM

In real-world applications, a margin does not always exist, meaning that classes are not always linearly separable in the feature space through a hyperplane. The concept underlying the Soft Margin allows us to overcome this limitation by introducing an additional variable $\xi$ for each sample, thereby relaxing the constraint on the margin

$\begin{displaymath} \begin{array}{l} y_i (\mathbf{w} \cdot \mathbf{x}_i + b) \ge 1 - \xi_i \\ \xi_i \ge 0, \forall i \end{array}\end{displaymath}$

(4.26)

. The parameter $\xi$ represents the slackness associated with the sample. When $0<\xi\le 1$ , the sample is correctly classified but lies within the margin area. When $\xi>1$ , the sample enters the decision space of the opposing class and is therefore classified incorrectly.

To search for a more optimal separating hyperplane, the cost function to minimize must also take into account the distance between the sample and the margin:

$\begin{displaymath} \min \frac{1}{2} \Vert \mathbf{w} \Vert^2 + C \sum \xi_i \end{displaymath}$

(4.27)

subject to the constraints (4.26). The parameter $C$

represents a degree of freedom in the problem, indicating how much a sample must "pay" for violating the margin constraint. When $C$

is small, the margin is wide, whereas when $C$

approaches infinity, it reverts to the Hard Margin formulation of SVM discussed earlier.

Each sample $\mathbf{x}_i$ can fall into one of three possible states:

it may lie beyond the margin $y_i(\mathbf{w}^\top \mathbf{x}_i + b)>1$ and therefore not contribute to the function;
it may lie on the margin $y_i (\mathbf{w}^\top \mathbf{x}_i + b)=1$ not participating directly in the minimization but only as a support vector;
finally, it may fall within the margin and be penalized according to how much it deviates from the hard constraints.

The Lagrangian of the system (4.27), with the constraints introduced by the variables $\xi$ , is

$\begin{displaymath} \mathcal{L}(\mathbf{w},b,\xi,\alpha) = \frac{1}{2} \Vert\mat... ...} \cdot \mathbf{x}_i + b) - 1 + \xi_i) - \sum_i \gamma_i \xi_i \end{displaymath}$

(4.28)

With the increase in the number of constraints, the dual variables are both $\bm{\alpha}$ and $\bm{\gamma}$ .

The remarkable result is that, upon applying the derivatives, the dual formulation of (4.28) becomes exactly the same as the dual of the Hard Margin case: the variables $\xi_i$ do not appear in the dual formulation, and the only difference between the Hard Margin case and the Soft Margin case lies in the constraint on the parameters $\alpha_i$ , which in this case are limited to

$\begin{displaymath} 0 \le \alpha_i \le C \end{displaymath}$

(4.29)

instead of being subject to the simple inequality $\alpha_i \ge 0$ . The significant advantage of this formulation is precisely in the high simplicity of the constraints and in the fact that it allows us to reduce the Hard Margin case to a particular case ( $C=\infty$ ) of the Soft Margin. The constant $C$

serves as an upper limit on the values that the $\alpha_i$ can assume.

Paolo medici
2025-10-22