Deep Neural Networks

In traditional neural networks, the optimization of weights starts from randomly chosen initial values. Although simple to implement, this choice implies that as the network depth increases, performance tends to degrade, while shallower architectures (with one or two hidden layers) are generally more stable and easier to train.

Historically, the training of multilayer neural networks (MLP) through gradient descent faced two main obstacles:

the presence of numerous local minima that hinder convergence;
the appearance of large plateaus in the error surface, which drastically slow down optimization.

As a consequence, the idea of using very deep networks to model complex problems was long considered impractical.

Starting in 2012, thanks to the availability of large amounts of data (Big Data), the increasing computational power provided by GPUs, and the development of more effective optimization techniques (such as Adam) 3.3.3, deep neural networks experienced a true renaissance.

A key turning point was the victory of AlexNet (KSH12b) at the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC): for the first time, a deep convolutional neural network clearly outperformed traditional approaches, marking the beginning of the modern era of deep learning.

Since then, deep networks have become the de facto standard in many areas of machine learning. In particular, the processing of structured data such as images has greatly benefited from convolutional neural networks (CNN), which exploit the spatial structure of the visual signal to more efficiently learn hierarchical and invariant representations.

Paolo medici
2025-10-22