The Mahalanobis Distance

A common problem is to determine how much an element $\mathbf {x}$ may belong to a probability distribution, allowing for an approximate estimation of whether this element is an inlier, meaning it belongs to the distribution, or an outlier, indicating that it is external to it.

The Mahalanobis distance (Mah36) provides a measure of an observation normalized with respect to its variance, which is why it is also referred to as the "generalized distance."

Definizione 6 The Mahalanobis distance of a vector $\mathbf {x}$ with respect to a distribution characterized by a mean $\mathbf{\mu}$ and a covariance matrix $\mathbf{\Sigma}$ is defined as

$\begin{displaymath} d(\mathbf{x}) = \sqrt { (\mathbf{x} - \mathbf{\mu})^{\top} \mathbf{\Sigma} ^{-1} (\mathbf{x} - \mathbf{\mu}) } \end{displaymath}$

(2.20)

the generalized distance of the point from the mean.

Such a distance can be extended (the generalized squared interpoint distance) to the case of two vectors $\mathbf {x}$ and $\mathbf{y}$ , which are realizations of the same random variable with covariance distribution $\mathbf{\Sigma}$ :

$\begin{displaymath} d(\mathbf{x}, \mathbf{y}) = \sqrt { (\mathbf{x} - \mathbf{y})^{\top} \mathbf{\Sigma} ^{-1} (\mathbf{x} - \mathbf{y}) } \end{displaymath}$

(2.21)

In the particular case of a diagonal covariance matrix, one retrieves the normalized Euclidean distance. Conversely, when the covariance matrix is precisely the identity matrix (i.e., the components of the distribution are indeed uncorrelated), the formulation above reduces to the classical Euclidean distance.

The Mahalanobis distance allows for the measurement of distances in samples where the units of measurement are unknown, effectively assigning an automatic scaling factor to the data.

Subsections

Standard Score

Paolo medici
2025-10-22