An example of dimensionality reduction for classification purposes is Linear Discriminant Analysis (Fisher, 1936).
When analyzing the functioning of PCA (section 2.10.1), this technique is limited to maximizing information without distinguishing between the potential classes that make up the problem: PCA does not take into account that the data may represent different categories. PCA is not a true classifier; rather, it is a technique useful for simplifying the problem by reducing its dimensionality. In contrast, LDA aims to maximize both the discriminatory information between classes and the information represented by variance.
In the case of a two-class problem, the optimal Bayesian classifier is the one that identifies the decision boundary formed by the hypersurface along which the conditional probability of the two classes is equal.
If we impose the hypothesis that the two classes of the binary problem have a multivariate Gaussian distribution and an equal covariance matrix, it is straightforward to demonstrate that the Bayesian decision margin, as shown in equation (4.11), becomes linear.
In LDA, the assumption of homoscedasticity is made, and under this assumption, the goal is to obtain a vector that allows projecting the n-dimensional space of events into a scalar space, which maximizes the separation between classes and enables them to be linearly separated through a separation margin of the form
| (4.13) |
To determine this separating surface, various metrics can be employed. The term LDA currently encompasses several techniques, among which Fisher's Linear Discriminant Analysis is the most widely referenced in the literature.
It can be demonstrated that the projection that maximizes the separation between the two classes from a "statistical" perspective, namely the decision hyperplane, is obtained with
| (4.14) |
| (4.15) |
This decision margin is the solution to the maximum likelihood in the case of two classes with uniform distribution and the same covariance.
Paolo medici