From a statistical perspective, the data vector
consists of realizations of a random variable from an unknown population. The task of data analysis is to identify the population that most likely generated those samples. In statistics, each population is characterized by a corresponding probability distribution, and associated with each probability distribution is a unique parameterization
: by varying these parameters, a different probability distribution should be generated.
Let
be the probability density function (PDF) that indicates the probability of observing
given a parameterization
. If the individual observations
are statistically independent of each other, the PDF of
can be expressed as the product of the individual PDFs:
| (2.48) |
Given a parameterization
, it is possible to define a specific PDF that illustrates the probability of the occurrence of certain data relative to others. In the real case, we face the exact reciprocal problem: the data have been observed, and we need to identify which
generated that specific PDF.
| (2.49) |
The principle of the maximum likelihood estimator (MLE)
, originally developed by R.A. Fisher in the 1920s, selects the best parameterization that best fits the probability distribution generated by the observed data.
In the case of a Gaussian probability distribution, an additional definition is useful.
| (2.50) |
The best estimate of the model parameters is the one that maximizes the likelihood, specifically the log-likelihood
| (2.51) |
It is possible to find in the literature, as an optimal estimator, instead of maximizing the likelihood function, the minimization of its opposite
| (2.52) |
This formulation is particularly useful when the noise distribution is Gaussian. Let be the realizations of the random variable. In the case of a generic function
with normally distributed noise, constant time, and zero mean, the likelihood is given by
| (2.53) |
| (2.54) |
Now, the partial derivatives of the log-likelihood form a vector
.
| (2.55) |
| (2.56) |
| (2.57) |