Performance Evaluation

Given a classifier trained on a specific training set (Training Set), it is necessary to evaluate it on another set (Validation Set or Certification Set). From this comparison, it is possible to extract metrics that allow for the assessment of the classifier and enable the comparison of different classifiers with one another. It is absolutely essential that the performance metrics are calculated on a set of samples not used during the training phase (the validation set) in order to detect issues such as data overfitting or lack of generalization.

Once the parameters of the classifier are set, the contingency table (Confusion Matrix) can be created:


    True Value
    p n
Classification p' VP FP
  n' FN
It seems that you've provided a LaTeX command that indicates the end of an HTML-only section. If you have specific text or content that you would like translated from Italian to English, please provide that text, and I will be happy to assist you with the translation while maintaining the LaTeX commands and structure.


False Positives (FP) are also referred to as False Alarms. False Negatives (FN) are known as misses.

From the table, several performance metrics are typically extracted, such as:

Each classifier has one or more parameters that, when modified, change the ratio between correct recognitions and the number of false positives. Consequently, it becomes challenging to objectively compare two classifiers, as one may exhibit a higher number of correct detections at the same threshold, but also a higher number of false positives. Therefore, to compare the performance of different binary classifiers obtained from various training sessions, it is common to use curves that vary with this internal threshold of the classifier.

The performance curves that can be found are:

It is important to note that these indices pertain to any class of problems that involves the concept of correct or incorrect results. Therefore, they are applicable not only to classifiers but also, for example, to associations of characteristic points and more.

Recently, to enable a more streamlined comparison of classifier performance, functions have been proposed that, when applied to ROC curves, yield a single scalar representing a score of classification quality. These functions are typically averages of samples taken from the ROC curve in the regions of practical interest.

Paolo medici
2025-10-22