Regression and Optimization Methods for Model Analysis

One of the most common problems within computer vision (and more broadly within information theory) is the adaptation of a set of noisy measurements (for example, the pixels of an image) to a predefined model.

In addition to the presence of noise, which could be white Gaussian but potentially of any statistical distribution, one must consider the issue of potential outliers, a term used in statistics to refer to data points that are too distant from the model to be considered part of it.

In this chapter, various regression techniques aimed at estimating the parameters $\boldsymbol\beta$ of a stationary model given a dataset affected by noise are presented, along with methods for identifying and removing outliers from the input data.

In the next chapter, techniques of "regression" more closely related to the theme of classification will be presented.

To estimate the parameters of a model, several techniques presented in the literature are as follows:

Least Squares Fitting: If the data consists entirely of inliers, there are no outliers, and the only disturbance is additive white Gaussian noise, least squares regression is the optimal technique (section 3.2);
M-Estimator: The presence of even a few outliers can significantly shift the model since the errors are squared (Hub96): weighting the distant points of the estimated model in a non-quadratic manner leads to improvements in the estimation itself (section 3.8);
IRLS: Iteratively reweighted least squares is used when the outliers are very distant from the model and present in low quantities: under this condition, an iterative regression can be performed (section 3.9), where in each cycle, points with excessively high errors are either removed (ILS) or weighted differently (IRLS);
Hough: If the input data is affected by both errors and numerous outliers, and there is potentially a multimodal distribution, but the model is formed by a few parameters, the Hough transform (Hou59) allows for the extraction of the most statistically prevalent model (section 3.11);
RANSAC: If the number of outliers is comparable to that of the inliers and the noise is very low (relative to the position of the outliers), RANdom SAmpling and Consensus (FB87) enables the identification of the best model present in the scene (section 3.12);
LMedS: The Least Median of Squares is an algorithm, similar to RANSAC, that ranks the points based on the distance from the randomly generated model and selects the model with the smallest median error (Rou84) (section 3.12.2);
Kalman: It is finally possible to use a Kalman filter to derive the parameters of a model (see 2.12.9) when such information is required at runtime.

Only RANSAC and the Hough Transform can effectively handle the case where two or more distributions are simultaneously approaching the model in the measurement.

Nothing ultimately prevents the use of mixed techniques; for example, a relatively coarse (and thus fast and low-memory) Hough transform can be employed to remove the outliers, followed by a least squares regression to obtain the model parameters more precisely.

Subsections

Paolo medici
2025-10-22