Soft Margin SVM

In real-world applications, a margin does not always exist, meaning that classes are not always linearly separable in the feature space through a hyperplane. The concept underlying the Soft Margin allows us to overcome this limitation by introducing an additional variable $\xi$ for each sample, thereby relaxing the constraint on the margin
\begin{displaymath}
\begin{array}{l}
y_i (\mathbf{w} \cdot \mathbf{x}_i + b) \ge 1 - \xi_i \\
\xi_i \ge 0, \forall i
\end{array}\end{displaymath} (4.26)

. The parameter $\xi$ represents the slackness associated with the sample. When $0<\xi\le 1$, the sample is correctly classified but lies within the margin area. When $\xi>1$, the sample enters the decision space of the opposing class and is therefore classified incorrectly.

To search for a more optimal separating hyperplane, the cost function to minimize must also take into account the distance between the sample and the margin:

\begin{displaymath}
\min \frac{1}{2} \Vert \mathbf{w} \Vert^2 + C \sum \xi_i
\end{displaymath} (4.27)

subject to the constraints (4.26). The parameter $C$ represents a degree of freedom in the problem, indicating how much a sample must "pay" for violating the margin constraint. When $C$ is small, the margin is wide, whereas when $C$ approaches infinity, it reverts to the Hard Margin formulation of SVM discussed earlier.

Each sample $\mathbf{x}_i$ can fall into one of three possible states:

The Lagrangian of the system (4.27), with the constraints introduced by the variables $\xi$, is

\begin{displaymath}
\mathcal{L}(\mathbf{w},b,\xi,\alpha) = \frac{1}{2} \Vert\mat...
...} \cdot \mathbf{x}_i + b) - 1 + \xi_i) - \sum_i \gamma_i \xi_i
\end{displaymath} (4.28)

With the increase in the number of constraints, the dual variables are both $\bm{\alpha}$ and $\bm{\gamma}$.

The remarkable result is that, upon applying the derivatives, the dual formulation of (4.28) becomes exactly the same as the dual of the Hard Margin case: the variables $\xi_i$ do not appear in the dual formulation, and the only difference between the Hard Margin case and the Soft Margin case lies in the constraint on the parameters $\alpha_i$, which in this case are limited to

\begin{displaymath}
0 \le \alpha_i \le C
\end{displaymath} (4.29)

instead of being subject to the simple inequality $\alpha_i \ge 0$. The significant advantage of this formulation is precisely in the high simplicity of the constraints and in the fact that it allows us to reduce the Hard Margin case to a particular case ($C=\infty$) of the Soft Margin. The constant $C$ serves as an upper limit on the values that the $\alpha_i$ can assume.

Paolo medici
2025-10-22