Ensemble Learning

The concept of Ensemble training refers to the use of various distinct classifiers, combined in a certain manner to maximize performance by leveraging the strengths of each while mitigating the weaknesses of the individual classifiers.

At the core of the concept of Ensemble Learning are weak classifiers: a weak classifier is capable of classifying at least the $50\%+1$ of the samples in a binary problem. When combined in a certain way, weak classifiers allow for the construction of a strong classifier, simultaneously addressing typical issues associated with traditional classifiers, with overfitting being the primary concern.

The origin of Ensemble Learning, the concept of a weak classifier, and in particular the notion of probably approximately correct learning (PAC) were first introduced by Valiant (Val84).

In fact, Ensemble Learning techniques do not provide general purpose classifiers; rather, they indicate the optimal way to combine multiple classifiers together.

Examples of Ensemble Learning techniques include

Decision Tree: Decision Trees, being constructed from multiple Decision Stumps in a cascade, serve as a primary example of Ensemble Learning;
Bagging: BootStrap AGGregatING aims to mitigate overfitting issues by training various classifiers on subsets of the training set and ultimately performing a majority vote;
Boosting: Instead of using purely random subsets of the training set, samples that remain incorrectly classified are partially utilized;
AdaBoost: ADAptive BOOSTing (section 4.6.2) is the most well-known Ensemble Learning algorithm and the progenitor of the highly successful family of AnyBoost classifiers;
Random Forest^TM: is a BootStrap Aggregating (bagging) method applied to Decision Trees, forming an Ensemble Classifier composed of multiple decision trees, each created from a subset of the training data and features to be analyzed, which vote for the majority;

and many others.

Examples of weak classifiers widely used in the literature include Decision Stumps (AL92) associated with Haar features (section 6.1). The Decision Stump is a binary classifier in the form

$\begin{displaymath} h(\mathbf{x}) = \left\{ \begin{array}{ll} +1 & \quad \text... ...theta \\ -1 & \quad \text{otherwise} \\ \end{array}\right. \end{displaymath}$

(4.45)

where $f(\mathbf{x})$ is a function that extracts a scalar from the sample to be classified, $p=\{ +1, -1 \}$ is a parity that indicates the direction of the inequality, and $\theta$ is the decision threshold (figure 4.4).

Subsections

Paolo medici
2025-10-22