Binary Classifiers

A particularly common case of classifier is that of a binary classifier. In this scenario, the problem consists of finding a relationship that links the training-set $S=\{ (\mathbf{x}_1, y_1) \ldots (\mathbf{x}_l, y_l) \} \in (\mathbb{X} \times \mathbb{Y})$ where $\mathbb{X} \subseteq \mathbb{R}^{n}$ is the vector that gathers the information to be used for training and $\mathbb{Y}=\{+1,-1\}$ is the space of the associated classes.

Examples of intrinsically binary classifiers include:

LDA
Linear Discriminant Analysis (section 4.3) is a technique that finds the separating plane between classes that maximizes the distance between the distributions;
Decision Stump
One-level decision trees have only two possible outputs;
SVM
Support Vector Machines (section 4.4) partition the feature space by maximizing the margin using hyperplanes or simple surfaces.

Particular interest is given to linear classifiers (LDA and Linear SVM), which, in order to solve the binary classification problem, identify a separating hyperplane $(\mathbf{w},b)$ between the two classes.

The equation of a hyperplane, slightly modifying the formula (1.49), is

\begin{displaymath}
\mathbf{w} \cdot \mathbf{x} + b = 0
\end{displaymath} (4.4)

where the normal vector $\mathbf{w}$ may not necessarily be of unit norm. A hyperplane divides the space into two subspaces where the equation (4.4) has opposite signs. The separating surface is a hyperplane that divides the space into two subregions representing the two categories of binary classification.

A linear classifier is based on a discriminant function

\begin{displaymath}
f(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b
\end{displaymath} (4.5)

. The vector $\mathbf{w}$ is referred to as the weight vector, and the term $b$ is called the bias. Linear classifiers hold significant importance as they transform the problem from multidimensional to scalar by projecting along the axis $\mathbf{w}$.

The sign of the function $f(\mathbf{x})$ represents the outcome of the classification. A separating hyperplane corresponds to identifying a linear combination of the elements $\mathbf{x} \in \mathbf{X}$ in such a way as to obtain

\begin{displaymath}
\hat{y} = \sgn ( \mathbf{w} \cdot \mathbf{x} + b )
\end{displaymath} (4.6)

.

Paolo medici
2025-10-22