Principles of covariant detection

The goals of a co-variant detector were discussed in Covariant detectors fundamentals. This page introduces a few general principles that are at the basis of most covariant detection algorithms. Consider an input image $\ell$ and a two dimensional continuous and invertible warp $w$. The warped image $w[\ell]$ is defined to be

$w[\ell] = \ell \circ w^{-1},$

or, equivalently,

$w[\ell](x,y) = \ell(w^{-1}(x,y)), \qquad \forall (x,y)\in\real^2.$

Note that, while $w$ pushes pixels forward, from the original to the transformed image domain, defining the transformed image $\ell'$ requires inverting the warp and composing $\ell$ with $w^{-1}$.

The goal a covariant detector is to extract the same local features irregardless of image transformations. The detector is said to be covariant or equivariant with a class of warps $w\in\mathcal{W}$ if, when the feature $R$ is detected in image $\ell$, then the transformed feature $w[R]$ is detected in the transformed image $w[\ell]$.

The net effect is that a covariant feature detector appears to “track” image transformations; however, it is important to note that a detector is not a tracker because it processes images individually rather than jointly as part of a sequence.

An intuitive way to construct a covariant feature detector is to extract features in correspondence of images structures that are easily identifiable even after a transformation. Example of specific structures include dots, corners, and blobs. These will be generically indicated as corners in the followup.

A covariant detector faces two challenges. First, corners have, in practice, an infinite variety of individual appearances and the detector must be able to capture them to be of general applicability. Second, the way corners are identified and detected must remain stable under transformations of the image. These two problems are addressed in Local maxima of a cornerness measure and Covariant detection by normalization respectively.

# Detection using a cornerness measure

One way to decide whether an image region $R$ contains a corner is to compare the local appearance to a model or template of the corner; the result of this comparisons produces a cornerness score at that location. This page describe general theoretical properties of the cornerness and the detection process. Concrete examples of cornerness are given in Cornerness measures.

A cornerness measure associate a score to all possible feature locations in an image $\ell$. As described in Feature geometry and feature frames, the location or, more in general, pose $u$ of a feature $R$ is the warp $w$ that maps the canonical feature frame $R_0$ to $R$:

$R = u[R_0].$

The goal of a cornerness measure is to associate a score $F(u;\ell)$ to all possible feature poses $u$ and use this score to extract a finite number of co-variant features from any image.

## Local maxima of a cornerness measure

Given the cornerness of each candidate feature, the detector must extract a finite number of them. However, the cornerness of features with nearly identical pose must be similar (otherwise the cornerness measure would be unstable). As such, simply thresholding $F(w;\ell)$ would detect an infinite number of nearly identical features rather than a finite number.

The solution is to detect features in correspondence of the local maxima of the score measure:

$\{w_1,\dots,w_n\} = \operatorname{localmax}_{w\in\mathcal{W}} F(w;\ell).$

This also means that features are never detected in isolation, but by comparing neighborhoods of them.

## Covariant detection by normalization

The next difficulty is to guarantee that detection is co-variant with image transformations. Hence, if $u$ is the pose of a feature extracted from image $\ell$, then the transformed pose $u' = w[u]$ must be detected in the transformed image $\ell' = w[\ell]$.

Since features are extracted in correspondence of the local maxima of the cornerness score, a sufficient condition is that corresponding features attain the same score in the two images:

$\forall u\in\mathcal{W}: \quad F(u;\ell) = F(w[u];w[\ell]), \qquad\text{or}\qquad F(u;\ell) = F(w \circ u ;\ell \circ w^{-1}).$

One simple way to satisfy this equation is to compute a cornerness score after normalizing the image by the inverse of the candidate feature pose warp $u$, as follows:

$F(u;\ell) = F(1;u^{-1}[\ell]) = F(1;\ell \circ u) = \mathcal{F}(\ell \circ u),$

where $1 = u^{-1} \circ u$ is the identity transformation and $\mathcal{F}$ is an arbitrary functional. Intuitively, co-variant detection is obtained by looking if the appearance of the feature resembles a corner only after normalization. Formally:

\begin{align*} F(w[u];w[\ell]) &= F(w \circ u ;\ell \circ w^{-1}) \\ &= F(1; \ell \circ w^{-1} \circ w \circ u) \\ &= \mathcal{F}(\ell\circ u) \\ &= F(u;\ell). \end{align*}

Concrete examples of the functional $\mathcal{F}$ are given in Cornerness measures.

## Locality of the detected features

In the definition above, the cornenress functional $\mathcal{F}$ is an arbitrary functional of the entire normalized image $u^{-1}[\ell]$. In practice, one is always interested in detecting local features (at the very least because the image extent is finite).

This is easily obtained by considering a cornerness $\mathcal{F}$ which only looks in a small region of the normalized image, usually corresponding to the extent of the canonical feature $R_0$ (e.g. a unit disc centered at the origin).

In this case the extent of the local feature in the original image is simply given by $R = u[R_0]$.

# Partial and iterated normalization

Practical detectors implement variants of the ideas above. Very often, for instance, detection is an iterative process, in which successive parameters of the pose of a feature are determined. For instance, it is typical to first detect the location and scale of a feature using a rotation-invariant cornerness score $\mathcal{F}$. Once these two parameters are known, the rotation can be determined using a different score, sensitive to the orientation of the local image structures.

Certain detectors (such as Harris-Laplace and Hessian-Laplace) use even more sophisticated schemes, in which different scores are used to jointly (rather than in succession) different parameters of the pose of a feature, such as its translation and scale. While a formal treatment of these cases is possible as well, we point to the original papers.