Neural Networks

**Figure 4.8:** Example of a neural network topology.

Research in Machine Learning (and, more broadly, in Computer Vision) has consistently sought inspiration from the human brain for the development of algorithms. Artificial Neural Networks (ANNs) are based on the concept of the "artificial neuron," which is a structure that, similar to the neurons of living beings, applies a nonlinear transformation (known as the activation function) to the weighted contributions of the various inputs to the neuron:

$\begin{displaymath} \text{output} = f\left(\sum_{i=1}^{n} w_i x_i + b\right) \end{displaymath}$

(4.79)

where $x_i$ are the various inputs related to the $j$ -th neuron, associated with the weights $w_i$ , $\text{output}$ is the response of the neuron, and the activation function $f$ , which is highly nonlinear, is typically a step function, a sigmoid, or a logistic function. The bias $b$ is sometimes simulated with a constant input $x_0$ .

The simplest neural network, consisting of an input layer and an output layer, is analogous to the perceptron model introduced by Rosenblatt in 1957. Similar to the brains of living beings, an artificial neural network consists of the interconnection of various artificial neurons.

The geometry of a feedforward neural network, the topology commonly used in practical applications, is that of the MultiLayer Perceptron (MLP) and consists of the combination of multiple hidden layers of neurons that connect the input stage to the output stage, which will serve as the input for the subsequent layer. A multilayer perceptron can be likened to a function

The training phase involves estimating the weights $w^{k}_i$ that minimize the error between the training labels and the values predicted by the network $f_w(\mathbf{x})$ :

$\begin{displaymath} S(\mathbf{w}) = \sum_i \left\Vert \mathbf{y}_i - f_\mathbf{w}(\mathbf{x}_i) \right\Vert^2 \end{displaymath}$

(4.80)

The estimation of the weights $w^{k}_i$ can be achieved using well-known optimization techniques: typically, the back propagation method is utilized, which is essentially a gradient descent approach employing the chain rule for derivative calculations, given that MLPs are layered structures.

Subsections

Activation Functions

Paolo medici
2025-10-22