RANSAC

The RANdom Sample And Consensus algorithm is an iterative method for estimating the parameters of a model where the dataset is heavily influenced by the presence of outliers. This algorithm (FB81) is a non-deterministic approach based on the random selection of the elements that generate the model.

RANSAC and all its variants can be viewed as algorithms that iteratively alternate between two phases: the hypothesis generation phase and the hypothesis evaluation phase.

The algorithm, in brief, consists of randomly selecting $s$ samples from all $n$ input samples $X=\{\mathbf{x}_1 , \ldots, \mathbf{x}_n \}$ , with $s$ large enough to derive a model (the hypothesis). Once a hypothesis is obtained, the number of $n$ elements from $X$ that are close enough to it to belong to it is counted. A sample $\mathbf{x} \in S$ belongs or does not belong to the hypothetical model (i.e., it is a hypothetical inlier or outlier) if its distance from the model $d_{\boldsymbol\beta} (\mathbf{x})$ is less than or greater than a given threshold $\tau$ , a threshold that is typically dependent on the problem. The threshold $\tau$ encounters practical issues where the additive error is Gaussian, meaning that the support is infinite. In this case, it is still necessary to define a probability $p$ of detecting the inliers to establish a threshold $\tau$ .

All the samples that satisfy the hypothesis are called consents (consensus).

The set of consents $S$ associated with the hypothesis $\boldsymbol\beta$ is the consensus set of $\boldsymbol\beta$ :

$\begin{displaymath} S(\boldsymbol\beta) = \left\{ \mathbf{x} \in X : d_{\boldsymbol\beta} (\mathbf{x}) < \tau \right\} \end{displaymath}$

(3.110)

Among all the randomly generated models, the model that meets a specific metric is selected; for instance, in the original RANSAC, this is the model that has the maximum consensus cardinality.

One of the challenges is determining how many hypotheses to generate in order to have a good probability of obtaining the correct model.

There exists a statistical relationship between the number of iterations $N$ and the probability $p$ of identifying a solution composed solely of inliers. The number of attempts $N$ must satisfy $(1 - P)^N \le 1 - p$ , which can be expressed as

$\begin{displaymath} N \ge \frac{\log (1 - p) }{\log(1 - P)} \end{displaymath}$

(3.111)

, where

is the probability of having selected a solution consisting only of inliers.

Typically, as an approximation of $P$ , one can use $P=(1 - \epsilon)^{s}$ ^3.3 and therefore

$\begin{displaymath} N = \frac{ \log ( 1 - p ) } { \log ( 1 - (1 - \epsilon)^{s} ) } \end{displaymath}$

(3.112)

with $\epsilon$ being the a priori probability of selecting an outlier and $s$

the number of elements needed to define the model. The size of a minimum consensus set can be statistically deduced simply as $T = (1 - \epsilon) n$ .

Typically, $s$ is chosen to be equal to the number of elements required to create the model; however, if it exceeds this number, the generated model must be constructed through numerical regression based on the provided constraints. This condition is necessary when the noise variance is high, although it increases the risk of incorporating outliers into the constraints.

Footnotes

...#tex2html_wrap_inline9679#^3.3: The correct estimate is $P=\frac{\binom{pn}{s}}{\binom{n}{s}}$ with being the total number of inliers, the total number of elements, and the number of elements required.

Subsections

Paolo medici
2025-10-22