Bayes' Theorem

The definition of conditional probability allows us to immediately obtain the following fundamental result:

Teorema 1 (Bayes' theorem) Let $\{\Omega,\mathcal{Y},p\}$ be a probability space. Consider the events (abbreviated as ) with being a complete system of events of $\Omega$ and $p(y_i)>0 \; \forall i=1..n$ .

In this case, $\forall y_i \in \mathcal{Y}$ with will yield:

$\begin{displaymath} p(y_i\vert x)=\frac{p(y_i)p(x\vert y_i)}{\sum_{j=1}^n p(y_j)p(x\vert y_j)} \end{displaymath}$

(4.7)

and this $\forall i=1..n$ .

The Bayes' theorem is one of the fundamental elements of the subjectivist or personal approach to probability and statistical inference. The system of alternatives $y_i$

with

is often interpreted as a set of causes, and Bayes' theorem, given the prior probabilities of the different causes, allows for the assignment of probabilities to the causes given an effect $x$

. The probabilities $p(y_i)$

with

can be interpreted as the a priori knowledge (usually denoted by $\pi_i$ ), which is the knowledge available before conducting a statistical experiment. The probabilities $p(x\vert y_i)$ with $i=1..n$

are interpreted as the likelihood or information regarding $x$

that can be obtained by performing an appropriate statistical experiment. Thus, Bayes' formula suggests a mechanism for learning from experience: by combining some a priori knowledge about the event $y_i$

provided by $p(y_i)$

with the knowledge that can be acquired from a statistical experiment given by $p(x\vert y_i)$ , one arrives at a better understanding represented by $p(x_i\vert y)$ of the event $x_i$

, also referred to as posterior probability after conducting the experiment.

We can have, for example, the probability distribution for the color of apples, as well as that for oranges. To use the notation introduced earlier in the theorem, let $y_{1}$ denote the state in which the fruit is an apple, $y_{2}$ the condition in which the fruit is an orange, and let $x$

be a random variable representing the color of the fruit. With this notation, $p(x\vert y_1)$ represents the density function for the event color $x$

conditioned on the fact that the state is apple, and $p(x\vert y_2)$ that it is orange.

During the training phase, it is possible to construct the probability distribution of $p(x\vert y_i)$ for $i$

apple or orange. In addition to this knowledge, the prior probabilities $p(y_{1})$ and $p(y_{2})$ are always known, which simply represent the total number of apples compared to the number of oranges.

What we are looking for is a formula that indicates the probability of a fruit being an apple or an orange, given that a certain color $x$

has been observed.

In general, for $n$

classes, the Bayesian estimator can be defined as a discriminant function:

$\begin{displaymath} f(x) = \hat{y}(x) = \argmax_i p(y_i\vert x) = \argmax_i p(x\vert y_i) \pi_i \end{displaymath}$

(4.9)

It is also possible to calculate an index, given the prior knowledge of the problem, indicating how much this reasoning will be subject to errors. The probability of making an error given an observed feature $x$

will depend on the maximum value of the $n$

curves of the distribution in $x$

$\begin{displaymath} p(error\vert x) = 1 - \max \left[ p(y_1\vert x), p(y_2\vert x), \dots, p(y_n\vert x) \right] \end{displaymath}$

(4.10)