Tutorials>Covariant feature detectors

This tutorial introduces the vl_covdet VLFeat command implementing a number of co-variant feature detectors and corresponding descriptors. This family of detectors include SIFT as well as multi-scale conern (Harris-Laplace), and blob (Hessian-Laplace and Hessian-Hessian) detectors. For example applications, see also the SIFT tutorial.

Extracting frames and descriptors

The first example shows how to use vl_covdet to compute and visualize co-variant features. Fist, let us load an example image and visualize it:

im = vl_impattern('roofs1') ;
figure(1) ; clf ;
image(im) ; axis image off ;
An example input image.

The image must be converted to gray=scale and single precision. Then vl_covdet can be called in order to extract features (by default this uses the DoG cornerness measure, similarly to SIFT).

imgs = im2single(rgb2gray(im)) ;
frames = vl_covdet(imgs, 'verbose') ;

The verbose option is not necessary, but it produces some useful information:

vl_covdet: doubling image: yes
vl_covdet: detector: DoG
vl_covdet: peak threshold: 0.01, edge threshold: 10
vl_covdet: detected 3518 features
vl_covdet: kept 3413 inside the boundary margin (2)

The vl_plotframe command can then be used to plot these features

hold on ;
vl_plotframe(frames) ;

which results in the image

The default features detected by vl_covdet use the DoG cornerness measure (like SIFT).

In addition to the DoG detector, vl_covdet supports a number of other ones:

  • The Difference of Gaussian operator (also known as trace of the Hessian operator or Laplacian operator) uses the local extrema trace of the multiscale Laplacian operator to detect features in scale and space (as in SIFT).
  • The Hessian operator uses the local extrema of the mutli-scale determinant of Hessian operator.
  • The Hessian Laplace detector uses the extrema of the multiscale determinant of Hessian operator for localisation in space, and the extrema of the multiscale Laplacian operator for localisation in scale.
  • Harris Laplace uses the multiscale Harris cornerness measure instead of the determinant of the Hessian for localization in space, and is otherwise identical to the previous detector..
  • Hessian Multiscale detects features spatially at multiple scales by using the multiscale determinant of Hessian operator, but does not attempt to estimate their scale.
  • Harris Multiscale is like the previous one, but uses the multiscale Harris measure instead.

For example, to use the Hessian-Laplace operator instead of DoG, use the code:

frames = vl_covdet(imgs, 'method', 'HarrisLaplace') ;

The following figure shows example of the output of these detectors:

Different detectors can produce a fairly different set of features.

Understanding feature frames

To understand the rest of the tutorial, it is important to understand the geometric meaning of a feature frame. Features computed by vl_covdet are oriented ellipses and are defined by a translation $T$ and linear map $A$ (a $2\times 2$) which can be extracted as follows:

 T = frame(1:2) ;
 A = reshape(frame(3:6),2,2)) ;

The map $(A,T)$ moves pixels from the feature frame (also called normalised patch domain) to the image frame. The feature is represented as a circle of unit radius centered at the origin in the feature reference frame, and this is transformed into an image ellipse by $(A,T)$.

In term of extent, the normalised patch domain is a square box centered at the origin, whereas the image domain uses the standard MATLAB convention and starts at (1,1). The Y axis points downward and the X axis to the right. These notions are important in the computation of normalised patches and descriptors (see later).

Affine adaptation

Affine adaptation is the process of estimating the affine shape of an image region in order to construct an affinely co-variant feature frame. This is useful in order to compensate for deformations of the image like slant, arising for example for small perspective distortion.

To switch on affine adaptation, use the EstimateAffineShape option:

frames = vl_covdet(imgs, 'EstimateAffineShape', true) ;

which detects the following features:

Affinely adapted features.

Feature orientation

The detection methods discussed so far are rotationally invariant. This means that they detect the same circular or elliptical regions regardless of an image rotation, but they do not allow to fix and normalise rotation in the feature frame. Instead, features are estimated to be upright by default (formally, this means that the affine transformation $(A,T)$ maps the vertical axis $(0,1)$ to itself).

Estimating and removing the effect of rotation from a feature frame is needed in order to compute rotationally invariant descriptors. This can be obtained by specifying the EstimateOrientation option:

frames = vl_covdet(imgs, 'EstimateOrientation', true, 'verbose') ;

which results in the following features being detected:

Features with orientation detection.

The method used is the same as the one proposed by D. Lowe: the orientation is given by the dominant gradient direction. Intuitively, this means that, in the normalized frame, brighter stuff should appear on the right, or that there should be a left-to-right dark-to-bright pattern.

In practice, this method may result in an ambiguous detection of the orientations; in this case, up to four different orientations may be assigned to the same frame, resulting in a multiplication of them.

Computing descriptors

vl_covdet can also compute descriptors. Three are supported so far: SIFT, LIOP and raw patches (from which any other descriptor can be computed). To use this functionality simply add an output argument:

[frames, descrs] = vl_covdet(imgs) ;

This will compute SIFT descriptors for all the features. Each column of descrs is a 128-dimensional descriptor vector in single precision. Alternatively, to compute patches use:

[frames, descrs] = vl_covdet(imgs, 'descriptor', 'liop') ;

Using default settings, each column will be a 144-dimensional descriptor vector in single precision. If you wish to change the settings, use arguments described in LIOP tutorial

[frames, descrs] = vl_covdet(imgs, 'descriptor', 'patch') ;

In this case each column of descrs is a stacked patch. To visualize the first 100 patches, one can use for example:

w = sqrt(size(patches,1)) ;
vl_imarraysc(reshape(patches(:,1:10*10), w,w,[])) ;
Patches extracted with the standard detectors (left) and adding affine adaptation (right).

There are several parameters affecting the patches associated to features. First, PatchRelativeExtent can be used to control how large a patch is relative to the feature scale. The extent is half of the side of the patch domain, a square in the frame reference frame. Since most detectors latch on image structures (e.g. blobs) that, in the normalised frame reference, have a size comparable to a circle of radius one, setting PatchRelativeExtent to 6 makes the patch about six times largerer than the size of the corner structure. This is approximately the default extent of SIFT feature descriptors.

A second important parameter is PatchRelativeSigma which expresses the amount of smoothing applied to the image in the normalised patch frame. By default this is set to 1.0, but can be reduced to get sharper patches. Of course, the amount of smoothing is bounded below by the resolution of the input image: a smoothing of, say, less than half a pixel cannot be recovered due to the limited sampling rate of the latter. Moreover, the patch must be sampled finely enough to avoid aliasing (see next).

The last parameter is PatchResolution. If this is equal to $w$, then the patch has a side of $2w+1$ pixels. (hence the sampling step in the normalised frame is given by PatchRelativeExtent/PatchResolution). Extracting higher resolution patches may be needed for larger extent and smaller smoothing. A good setting for this parameter may be PatchRelativeExtent/PatchRelativeSigma.

Custom frames

Finally, it is possible to use vl_covdet to compute descriptors on custom feature frames, or to apply affine adaptation and/or orientation estimation to these.

For example

delta = 30 ;
xr = delta:delta:size(im,2)-delta+1 ;
yr = delta:delta:size(im,1)-delta+1 ;
[x,y] = meshgrid(xr,yr) ;
frames = [x(:)'; y(:)'] ;
frames(end+1,:) = delta/2 ;

[frames, patches] = vl_covdet(imgs, ...
                              'frames', frames, ...
                              'estimateAffineShape', true, ...
                              'estimateOrientation', true) ;

computes affinely adapted and oriented features on a grid:

Custom frame (on a grid) after affine adaptation.

Getting the scale spaces

vl_covdet can return additional information about the features, including the scale spaces and scores for each detected feature. To do so use the syntax:

[frames, descrs, info] = vl_covdet(imgs) ;

This will return a structure info

info =

                    gss: [1x1 struct]
                    css: [1x1 struct]
             peakScores: [1x351 single]
             edgeScores: [1x351 single]
       orientationScore: [1x351 single]
    laplacianScaleScore: [1x351 single]

The last four fields are the peak, edge, orientation, and Laplacian scale scores of the detected features. The first two were discussed before, and the last two are the scores associated to a specific orientation during orientation assignment and to a specific scale during Laplacian scale estimation.

The first two fields are the Gaussian scale space and the cornerness measure scale space, which can be plotted by means of vl_plotss. The following is the of the Gaussian scale space for our example image:

Gaussian scale space.

The following is an example of the corresponding cornerness measure:

Cornerness scale space (Difference of Gaussians).