Another concept that has a cross-cutting relevance among the themes of computer vision is that of descriptor (Visual Descriptor). The descriptor is indeed utilized in various contexts: it is employed to perform comparisons between characteristic points or to generate the disparity map in stereoscopic vision, to provide a compact representation of a portion of the image to expedite its identification or retrieval, and due to this compact solution that preserves a significant amount of information, it is used to generate the feature space in classification algorithms.
Depending on the transformation that the image undergoes from which the points are to be characterized, the descriptor must satisfy certain invariance principles:
Before the introduction of the compact descriptor concept, the universally adopted method for comparing two feature points was the correlation between the areas surrounding the point:
| (6.1) |
A similar approach to correlation, which is not invariant to brightness but is more computationally efficient, is the SAD (Sum of Absolute Differences):
| (6.2) |
It is also worth noting that the comparison of pixels between images is nonetheless an algorithm of type : performing these comparisons point by point still requires significant computational weight and multiple memory accesses. Modern solutions aim to overcome this limitation by proposing the extraction of a descriptor from the neighborhood of the point, which is smaller in size than the number of represented pixels, yet maximizes the information contained within it.
Both SIFT (section 5.3) and SURF (section 5.4) extract their descriptors by leveraging scale and rotation information derived from the image (it is possible to extract this information independently, and therefore it can be applied to any class of descriptors to make them invariant to scale and rotation). The descriptors obtained from SIFT and SURF are different versions of the same concept, namely the histogram of oriented gradients (section 6.2), which serves as an example of how to compress the variability around a point into a reduced-dimensional space.
All currently used descriptors do not directly utilize the image points as descriptors, but it is easy to see that a sufficiently well-distributed subset of points is enough to achieve an accurate description of the point. In (RD05), a descriptor is created using the 16 pixels located along the discrete circumference of radius 3. This description can be made even more compact by transitioning to the binary form of the Local Binary Patterns described later, or by not being constrained to the circumference, as in Census or BRIEF.
Another approach is to appropriately sample the kernel space (GZS11), extracting from coordinates around the keypoint the values obtained from convolutions of the original image (horizontal and vertical Sobel), in order to create a descriptor consisting of just
values.
It is noteworthy that, for purely computational reasons related to resource reuse, a specific descriptor extractor is often associated with each particular feature point extractor.
From this introduction, it is clear that describing a key point with a smaller yet sufficiently descriptive dataset is a useful approach, especially in the context of classification. The concept of a descriptor arises from the attempt to extract local information from the image that allows for the preservation of a significant portion of the information. In this way, it becomes possible to perform (relatively) fast comparisons between points across images or to use such descriptors as features for training classifiers.