The RANdom Sample And Consensus algorithm is an iterative method for estimating the parameters of a model where the dataset is heavily influenced by the presence of outliers. This algorithm (FB81) is a non-deterministic approach based on the random selection of the elements that generate the model.
RANSAC and all its variants can be viewed as algorithms that iteratively alternate between two phases: the hypothesis generation phase and the hypothesis evaluation phase.
The algorithm, in brief, consists of randomly selecting samples from all
input samples
, with
large enough to derive a model (the hypothesis). Once a hypothesis is obtained, the number of
elements from
that are close enough to it to belong to it is counted. A sample
belongs or does not belong to the hypothetical model (i.e., it is a hypothetical inlier or outlier) if its distance from the model
is less than or greater than a given threshold
, a threshold that is typically dependent on the problem. The threshold
encounters practical issues where the additive error is Gaussian, meaning that the support is infinite. In this case, it is still necessary to define a probability
of detecting the inliers to establish a threshold
.
All the samples that satisfy the hypothesis are called consents (consensus).
The set of consents associated with the hypothesis
is the consensus set of
:
| (3.110) |
Among all the randomly generated models, the model that meets a specific metric is selected; for instance, in the original RANSAC, this is the model that has the maximum consensus cardinality.
One of the challenges is determining how many hypotheses to generate in order to have a good probability of obtaining the correct model.
There exists a statistical relationship between the number of iterations and the probability
of identifying a solution composed solely of inliers. The number of attempts
must satisfy
, which can be expressed as
| (3.111) |
Typically, as an approximation of , one can use
3.3 and therefore
| (3.112) |
Typically, is chosen to be equal to the number of elements required to create the model; however, if it exceeds this number, the generated model must be constructed through numerical regression based on the provided constraints. This condition is necessary when the noise variance is high, although it increases the risk of incorporating outliers into the constraints.