Bayesian Filters

In this section, the problem of statistical filtering is discussed, which refers to the class of problems where data from one or more sensors affected by noise is available. This data represents the observation of the dynamic state of a system that is not directly observable, but for which an estimate is required. The process through which one seeks to find the best estimate of the internal state of a system is called "filtering," as it is a method for filtering out the various components of noise. The evolution of a system (the evolution of its internal state) must adhere to known physical laws, influenced by a noise component (process noise). It is precisely through the understanding of the equations governing the evolution of the state that it becomes possible to provide a better estimate of the internal state.

A physical process can be viewed, in its state space representation (State Space Model), through a function that describes how the state $\mathbf{x}_t$ evolves over time:

$\begin{displaymath} \dot{\mathbf{x}}_t = f(t, \mathbf{x}_t, \mathbf{u}_t, \mathbf{w}_t) \end{displaymath}$

(2.83)

with $\mathbf{u}_t$ any known inputs to the system, and $\mathbf{w}_t$ a parameter representing the process noise, which accounts for the randomness that governs its evolution. Similarly, the observation of the state is also a process influenced by noise, in this case referred to as observation noise. In this case, it is also possible to define a function that models the observation $\mathbf{z}_t$ as

$\begin{displaymath} \mathbf{z}_t = h (t, \mathbf{x}_t, \mathbf{v}_t ) \end{displaymath}$

(2.84)

with $\mathbf{v}_t$ representing the observation noise and being a function solely of the current state.

This formalism is described in the continuous time domain. In practical applications, signals are sampled at discrete time $k$ and therefore a discrete-time version is typically used in the form

$\begin{displaymath} \begin{array}{l} x_{k+1} = f_k(x_k, u_k, w_k) \\ z_{k+1} = h_k(x_k, v_k) \\ \end{array}\end{displaymath}$

(2.85)

where

and

can be viewed as sequences of white noise with known statistics.

**Figure 2.4:** Example of evolution and observation of a Markovian system.

In systems that satisfy the equations (2.85), the evolution of the state is solely a function of the previous state, while the observation is only a function of the current state (see figure 2.4). If a system meets these assumptions, it is said to be a Markov process: the evolution of the system and the observation must depend only on the current state and not on past states. Access to information about the state always occurs indirectly through observation (known as a Hidden Markov Model).

Many approaches to estimate the unknown state of a system from a set of measurements do not account for the noisy nature of such observations. It is indeed possible to construct an algorithm that performs nonlinear regression on the observations to obtain estimates of all the states of the problem, solving an optimization problem with a high number of unknowns.

Filters, unlike regressions, aim to provide the best estimate of the variables (state) as the observation data arrives. From a theoretical standpoint, regressions represent the optimal case, while filtering converges to the correct result only after a sufficiently large number of samples.

Bayesian filters aim to estimate, at the discrete time instant $k$ , the state of the random variable $\mathbf{x}_k \in \mathbb{R}^n$ given an indirect observation of the system, $\mathbf{z}_k \in \mathbb{R}^{m}$ .

Filtering techniques allow for both obtaining the best estimate of the unknown state $\mathbf{x}_k$ and the multivariate probability distribution $p(\mathbf{x}_k)$ that represents the knowledge of the state itself.

Given the observation of the system, it is possible to define a probability density of $\mathbf{x}_k$ a posteriori to the observation of the event $\mathbf{z}_{k}$ due to the additional knowledge gained from such observation:

$\begin{displaymath} p^{+}(\mathbf{x}_k) = p(\mathbf{x}_k \vert \mathbf{z}_k) \end{displaymath}$

(2.86)

where the conditional probability $p(\mathbf{x}_k \vert \mathbf{z}_k)$ indicates the probability that the hidden state is $\mathbf{x}_k$ given the observation $\mathbf{z}_k$ . The "function" $p(\mathbf{x}_k \vert \mathbf{z}_k)$ represents the measurement model. In the literature, the posterior distribution $p^{+}(\mathbf{x}_k)$ is also referred to as belief.

Applying Bayes' theorem to equation (2.86) yields

$\begin{displaymath} p(\mathbf{x}_k \vert \mathbf{z}_k) = c_k p(\mathbf{z}_k \vert \mathbf{x}_k) p(\mathbf{x}_{k}) \end{displaymath}$

(2.87)

with

as the normalization factor such that $\int p(\mathbf{x}_k \vert \mathbf{z}_k)=1$ . The knowledge of $p(\mathbf{z}_k \vert \mathbf{x}_k)$ is essential, which represents the probability that the observation is precisely the quantity $\mathbf{z}_k$ observed given the possible state $\mathbf{x}_k$ . The use of Bayes' theorem to estimate the state given the observation is the reason why this class of filtering is referred to as Bayesian.

In addition to the a posteriori knowledge of the probability distribution, it is possible to leverage further information to improve the estimation: the a priori knowledge regarding the observation, obtained from the constraint that the state does not evolve in a completely unpredictable manner but rather can only evolve in certain ways with specific probabilities. These ways in which the system can evolve are solely a function of the current state.

The Markovian process hypothesis implies that the only past state influencing the evolution of the system is the state at time $k-1$ , that is, $p(x_k \vert x_{1:k-1}) = p(x_k \vert x_{k-1})$ .

It is therefore possible to perform the prediction a priori, thanks to the Chapman-Kolmogorov equation:

$\begin{displaymath} p^{-}(\mathbf{x}_k) = \int p(\mathbf{x}_k \vert \mathbf{x}_{k-1}, \mathbf{u}_{k}) p(\mathbf{x}_{k-1}) d\mathbf{x}_{k-1} \end{displaymath}$

(2.88)

where $p(\mathbf{x}_k \vert \mathbf{x}_{k-1}, \mathbf{u}_{k})$ represents the dynamics of the system (dynamic model) and $\mathbf{u}_{k}$ are the potential inputs that influence the evolution of the system, of which, however, the knowledge is complete.

From the knowledge of the a priori state and the observation $z_k$ , it is possible to rewrite equation (2.86) in the state update equation

$\begin{displaymath} p^{+}(\mathbf{x}_k) = c_k p(\mathbf{z}_k \vert \mathbf{x}_k) p^{-}(\mathbf{x}_k) \end{displaymath}$

(2.89)

The state is estimated by alternating between a prediction phase (a priori estimation) and an observation phase (a posteriori estimation). This iterative process is known as Recursive Bayesian Estimation.

The techniques described in this section will refer only to the most recent observation available for state estimation, for reasons of performance and simplicity. Formally, it is possible to extend the discussion to the case where all observations are utilized to obtain a more accurate estimate of the state. In this case, the filtering and prediction equations become

$\begin{displaymath} \begin{array}{l} p(\mathbf{x}_k \vert \mathbf{z}_{1:k} ) = ... ...mathbf{x}_k \vert \mathbf{z}_{1:k} ) d \mathbf{x}_k \end{array}\end{displaymath}$

(2.90)

For simplicity and due to the reduced computational burden, typically only the latest observation is evaluated; however, in certain cases (for example, in particle filters), it is possible to incorporate the knowledge of the entire past history into the equations quite easily.

As an estimate of continuous variables, it is not possible to exploit Bayesian theory "directly"; however, several approaches have been proposed in the literature to enable efficient estimation both from a computational perspective and in terms of memory usage.

Depending on whether the problem is linear or non-linear, and whether the noise probability distribution is Gaussian or not, each of these filters performs in a more or less optimal manner.

The Kalman Filter (section 2.12.2) is the optimal filter when the problem is linear and the noise distribution is Gaussian. The Extended Kalman Filter and the Unscented Kalman Filter, discussed in sections 2.12.4 and 2.12.5 respectively, are sub-optimal filters for nonlinear problems with Gaussian noise distribution (or slightly deviating from it). Finally, particle filters provide a sub-optimal solution for nonlinear problems with non-Gaussian noise distribution.

The grid-based filters (section 2.12.1) and particle filters (section 2.12.8) operate on a discrete representation of the state, while the Kalman, Extended, and Sigma-Point filters work on a continuous representation of the state.

Kalman, Extended Kalman, and Sigma Point Kalman filters estimate the uncertainty distribution (of the state, process, and observation) as a single Gaussian. There are multimodal extensions such as Multi-hypothesis tracking (MHT) that allow the application of Kalman filters to distributions modeled as mixtures of Gaussians, while particle filters and grid-based methods are inherently multimodal.

An excellent survey on Bayesian filtering is (Che03).

Subsections

Paolo medici
2025-10-22