sift.h File Reference


Detailed Description

Author:
Andrea Vedaldi
The Scale Invariant Feature Transform (SIFT) bundles a feature detector and a feature descriptor. This program implements a “SIFT filter”. This is a reusable object that can be used to extract SIFT features from multiple images of the same size.

The SIFT detector determines the geometry of a SIFT feature. Geometrically, the feature is an oriented disk (also called a feature frame or keypoint) and has a center $ (x,y) $, a scale $ \sigma $, and an orientation $ \theta $. The SIFT detector works by identifying blob-like structures in an image and attaching oriented disks to them.

The SIFT descriptor describes compactly the appearance of the image region corresponding to a SIFT frame. The SIFT descriptor works by extracting a Histogram of Oriented Gradients (HOG), which is a statistic of the gradient orientations inside the image region.

Using the SIFT filter

The code provided in this module can be used in different ways. You can instantiate and use a SIFT filter to extract both SIFT keypoints and descriptors from one or multiple images. Alternatively, you can use a lower level function to run only a part of the SIFT algorithm (for instance, to compute the SIFT descriptors of custom keypoints).

To use the SIFT filter:

To compute SIFT descriptors of custom keypoints, use vl_sift_calc_raw_descriptor().

The scale space

The SIFT detector searches for image blobs at multiple scales. In order to do this, it first computes a Gaussian pyramid by gradually smoothing the image and reducing its scale (resolution). Then, it looks for blobs at all possible locations and scales.

Scales are sampled by octaves and by sublevels within each octave. This is controlled by three parameters: the starting octave $o_{\mathrm{min}}$, the number of octaves O, and the number of subdivisions for each octave S. While O is usually set to its maximum value, $o_min$ can be set to either 0 (native resolution), -1 (subpixel resolution), or a larger value (coarser resolution). The effect of the number of subdivision S is more subtle, and we recommend reading Lowe's original paper.

Parameters controlling the scale space
parameter alt. name default value set by
$O$ O as big as possible vl_sift_new()
$o_{\mathrm{min}}$ o_min -1 vl_sift_new()
$S$ S 3 vl_sift_new()

Scale space details

In addition to the Gaussian scale space, SIFT uses a Difference of Gaussians (DoG) scale space, obtained by subtracting successive scales of the Gaussian scale space. The ensemble of the smoothed images and their differences are organized as follows:

sift-ss.png

The black vertical segments represent images of the Gaussian Scale Space (GSS), arranged by increasing scale $\sigma$. The image at scale $\sigma$ is equal to the image at scale 1 smoothed with a Gaussian kernel of that variance.

The input image is assumed to be pre-smoothed at scale $\sigma_n$ due to pixel aliasing (hence the image at scale 1 is not really available).

Scales are sampled at points $\sigma(o,s)$ logarithmically spaced. The levels are indexed by o (octave index) and s (level index). There are O octaves and S levels per octave. Images are downsampled at the octave boundaries; this is represented by the length of the black segments, which are proportional to the resolution of the corresponding image.

The octave index o starts at $o_{\mathrm{min}}$ and ends at $o_{\mathrm{min}}+O-1$. The level index starts at $s_{\mathrm{min}}=-1$ and ends at $s_{\mathrm{max}} = S+2$. From the picture is apparent that a few scale levels are represented twice, at two different resolutions. This is necessary to initialize the computation of the next octave and the computation of the DoG scale space.

The DOG scale space is obtained by subtracting contiguous GSS levels. The scale of a DOG level obtained in such a way can be thought as sitting in between the scales of the two images being subtracted. Pictorially, the DOG levels are represented as vertical blue segments and sit in between the smoothed images (black segments).

The SIFT detector extracts local extrema of the DoG scale space (in both x, y and $ \sigma $ directions). To compute such local extrema it is necessary to sample the DoG scale space in a 3x3x3 neighborhood. This means that local extrema cannot be extracted in correspondence of the first and last levels of an octave. This is the reason why we compute two redundant levels for each octave.

The detector

The SIFT frames (keypoints) are extracted based on peaks (local extrema) of the DoG scale space. Peaks are searched in a neighborhood of 3x3x3 samples (in space and scale). The previous figure shows the scale levels involved in this search (they are the ones at the intersection of two green arrows). Peaks are then quadratically interpolated. Finally, they are filtered and the orientation(s) is computed as explained in the next sections.

Peak threshold

Peaks which are too short may have been generated by noise and are discarded. This is done by comparing the absolute value of the DoG scale space at the peak with the peak threshold $t_p$ and discarding the peak its value is below the threshold.

Edge threshold

Peaks which are too flat are often generated by edges and do not yield stable features. These peaks are detected and removed as follows. Given a peak $x,y,\sigma$, the algorithm evaluates the Jacobian of the $x,y$ slice of DoG scale space at the scale $\sigma$. Then the following score (similar to the Harris function) is computed:

\[ \frac{(\mathrm{tr}\,G(x,y))^2}{\det G(x,y)} \]

This score has a minimum (equal to 4) when both eigenvalues of the Jacobian are equal (curved peak) and increases as one of the eigenvalues grows and the other stays small. Peaks are retained if the score is below the quantity $(t_e+1)(t_e+1)/t_e$, where $t_e$ is the edge threshold. Notice that this quantity has a minimum equal to 4 when $t_e=1$ and grows thereafter. Therefore the range of the edge threshold is $[1,\infty)$.

Orientations

A peak in the DoG scale space fixes 2 parameters of the keypoint: the position and scale. It remains to choose an orientation. In order to do this, SIFT computes an histogram of the gradient orientations in a Gaussian window with a standard deviation which is 1.5 times bigger than the scale $\sigma$ of the keypoint.

sift-orient.png

This histogram is then smoothed and the maximum is selected. In addition to the biggest mode, up to other three modes whose amplitude is within the 80% of the biggest mode are retained and returned as additional orientations.

parameter alt. name default value set by
$t_e$ edge_thresh 10.0 vl_sift_set_edge_thresh()
$t_p$ peak_thresh 0 vl_sift_set_peak_thresh()

The descriptor

The SIFT descriptor is a three dimensional histogram $h(\theta,x,y)$ of the orientation $\theta$ and position $(x,y)$ of the gradient inside a patch surrounding the keypoint. The figure illustrates the layout of the histogram:

sift-bins.png

The SIFT descriptor $h(\theta,x,y)$ is a 3-D array, but it is usually presented as a vector. This vector is obtained by stacking the 3-D array (being $\theta$ the fastest varying index).

Bins are bilinearly interpolated and partially overlap. There are $B_p$ bins along the x and y directions and $B_o$ along the $\theta$ direction.

The spatial extent of the bins depends on the keypoint scale $\sigma$ and a magnification factor m. A bin extends for $m\sigma$ pixels (ignoring bilinear smoothing). In the picture, the small blue circle has radius proportional to $\sigma$ and m is equal to 3 (the standard value).

Since there are $B_p$ bins along each spatial direction the descriptor support extends for $m\sigma B_p$ pixels in each direction. Accounting for the extra half bin used by bilinear smoothing, the actual support extends for $m\sigma (B_p + 1)$ pixels. The picture shows the arrangement of the bins for $B_p = 4$.

When added to the histogram, gradients are weighted by their magnitude and by a Gaussian window centered at the keypoint. This Gaussian window has variance equal to $m \sigma B_p / 2$. In the picture, the Gaussian window is represented by the larger blue circle.

Remarks:
In practice, the descriptor is computed by scanning a rectangular area that covers its support (scaled to match the resolution of the corresponding image in the GSS scale space). Since the descriptor can be rotated, this area has extension $m\sigma (B_p+1)/2\sqrt{2}$ (see also the picture). This remark has significance only for the implementation.
The following table summarizes the descriptors parameters along with their standard vale.

parameter alt. name default value
$B_p$ BP 4
$B_o$ BO 8
$m$ magnif 3

The keypoint coordinates (x,y) are expressed in the standard image convention (y axis pointing down). This also establishes the convention for expressing the angle th of a vector v (here v could be either the gradient of the image or the direction of the keypoint). To slightly complicate the matter, however, the index th of the descriptor h(th,x,y) follows the opposite convention (this is for compatibility with Lowe's original SIFT implementation), as shown by the figure:

sift-angle.png

Extensions

Norm threshold

Near-uniform patches do not yield stable features. By default, all descriptors will be computed, but when this option is set, descriptors who have a small norm before scaling will be set explicitly to zero.

Descriptor that have norm below this value are set to the null vector. This is useful to remove low contrast patches. The norm of a descriptor is defined as the sum of the gradient magnitude accumulated into each of the bins.

Acknowledgments

  • Thanks to Wei Dong for finding a memory leak.
  • Thanks to Brian, Loic, Giuseppe, Liu, Erwin, P. Ivanov, Q.S. Luo for finding bugs in old versions of this program.

Author:
Andrea Vedaldi

Definition in file sift.h.

#include <stdio.h>
#include "generic.h"

Go to the source code of this file.


Data Structures

struct  _VlSiftKeypoint
 SIFT filter keypoint. More...
struct  _VlSiftFilt
 SIFT filter. More...

Typedefs

typedef float vl_sift_pix
 SIFT filter pixel type.

Functions

Create and destroy
VL_EXPORT VlSiftFiltvl_sift_new (int width, int height, int noctaves, int nlevels, int o_min)
 Create a new SIFT filter.
VL_EXPORT void vl_sift_delete (VlSiftFilt *f)
 Delete SIFT filter.
Process data
VL_EXPORT int vl_sift_process_first_octave (VlSiftFilt *f, vl_sift_pix const *im)
 Start processing a new image.
VL_EXPORT int vl_sift_process_next_octave (VlSiftFilt *f)
 Process next octave.
VL_EXPORT void vl_sift_detect (VlSiftFilt *f)
 Detect keypoints.
VL_EXPORT int vl_sift_calc_keypoint_orientations (VlSiftFilt *f, double angles[4], VlSiftKeypoint const *k)
 Calculate the keypoint orientation(s).
VL_EXPORT void vl_sift_calc_keypoint_descriptor (VlSiftFilt *f, vl_sift_pix *descr, VlSiftKeypoint const *k, double angle)
 Compute the descriptor of a keypoint.
VL_EXPORT void vl_sift_calc_raw_descriptor (VlSiftFilt const *f, vl_sift_pix const *image, vl_sift_pix *descr, int widht, int height, double x, double y, double s, double angle0)
 Run the SIFT descriptor on raw data.
VL_EXPORT void vl_sift_keypoint_init (VlSiftFilt const *f, VlSiftKeypoint *k, double x, double y, double sigma)
 Initialize a keypoint from its position and scale.
Retrieve data and parameters
VL_INLINE int vl_sift_get_octave_index (VlSiftFilt const *f)
 Get current octave index.
VL_INLINE int vl_sift_get_noctaves (VlSiftFilt const *f)
 Get number of octaves.
VL_INLINE int vl_sift_get_octave_first (VlSiftFilt const *f)
 Get first octave.
VL_INLINE int vl_sift_get_octave_width (VlSiftFilt const *f)
 Get current octave width.
VL_INLINE int vl_sift_get_octave_height (VlSiftFilt const *f)
 Get current octave height.
VL_INLINE int vl_sift_get_nlevels (VlSiftFilt const *f)
 Get number of levels per octave.
VL_INLINE int vl_sift_get_nkeypoints (VlSiftFilt const *f)
 Get number of keypoints.
VL_INLINE double vl_sift_get_peak_thresh (VlSiftFilt const *f)
 Get peaks treashold.
VL_INLINE double vl_sift_get_edge_thresh (VlSiftFilt const *f)
 Get edges threshold.
VL_INLINE double vl_sift_get_norm_thresh (VlSiftFilt const *f)
 Get norm threshold.
VL_INLINE double vl_sift_get_magnif (VlSiftFilt const *f)
 Get the magnification factor.
VL_INLINE vl_sift_pixvl_sift_get_octave (VlSiftFilt const *f, int s)
 Get current octave data.
VL_INLINE VlSiftKeypoint const * vl_sift_get_keypoints (VlSiftFilt const *f)
 Get keypoints.
Set parameters
VL_INLINE void vl_sift_set_peak_thresh (VlSiftFilt *f, double t)
 Set peaks threshold.
VL_INLINE void vl_sift_set_edge_thresh (VlSiftFilt *f, double t)
 Set edges threshold.
VL_INLINE void vl_sift_set_norm_thresh (VlSiftFilt *f, double t)
 Set norm threshold.
VL_INLINE void vl_sift_set_magnif (VlSiftFilt *f, double m)
 Set the magnification factor.

Function Documentation

VL_EXPORT void vl_sift_calc_keypoint_descriptor ( VlSiftFilt f,
vl_sift_pix descr,
VlSiftKeypoint const *  k,
double  angle0 
)

Parameters:
f SIFT filter.
descr SIFT descriptor (output)
k keypoint.
angle0 keypoint direction.
The function computes the SIFT descriptor of the keypoint k of orientation angle0. The function fills the buffer descr which must be large enough to hold the descriptor.

The function assumes that the keypoint is on the current octave. If not, it does not do anything.

Definition at line 1543 of file sift.c.

References fast_expn(), _VlSiftFilt::grad, normalize_histogram(), _VlSiftKeypoint::o, _VlSiftFilt::o_cur, _VlSiftFilt::s_max, _VlSiftFilt::s_min, update_gradient(), vl_abs_f(), vl_floor_f(), VL_MAX, VL_MIN, vl_mod_2pi_f(), and VL_PI.

VL_EXPORT int vl_sift_calc_keypoint_orientations ( VlSiftFilt f,
double  angles[4],
VlSiftKeypoint const *  k 
)

Parameters:
f SIFT filter.
angles orientations (output).
k keypoint.
The function computes the orientation(s) of the keypoint k. The function returns the number of orientations found (up to four). The orientations themselves are written to the vector angles.

Remarks:
The function requires the keypoint octave k->o to be equal to the filter current octave vl_sift_get_octave. If this is not the case, the function returns zero orientations.

The function requires the keypoint scale level k->s to be in the range s_min+1 and s_max-2 (where usually s_min=0 and s_max=S+2). If this is not the case, the function returns zero orientations.

Returns:
number of orientations found.

Definition at line 1177 of file sift.c.

References fast_expn(), _VlSiftKeypoint::o, _VlSiftFilt::o_cur, _VlSiftFilt::s_max, _VlSiftFilt::s_min, update_gradient(), vl_floor_d(), VL_MAX, VL_MIN, and VL_PI.

VL_EXPORT void vl_sift_calc_raw_descriptor ( VlSiftFilt const *  f,
vl_sift_pix const *  grad,
vl_sift_pix descr,
int  width,
int  height,
double  x,
double  y,
double  sigma,
double  angle0 
)

Parameters:
f SIFT filter.
grad image gradients.
descr SIFT descriptor (output).
width image width.
height image height.
x keypoint x coordinate.
y keypoint y coordinate.
sigma keypoint scale.
angle0 keypoint orientation.
The function runs the SIFT descriptor on raw data. Here image is a 2 x width x height array (by convention, the memory layout is a s such the first index is the fastest varying one). The first width x height layer of the array contains the gradient magnitude and the second the gradient angle (in radians, between 0 and $ 2\pi $). x, y and sigma give the keypoint center and scale respectively.

In order to be equivalent to a standard SIFT descriptor the image gradient must be computed at a smoothing level equal to the scale of the keypoint. In practice, the actual SIFT algorithm makes the following additional approximation, which influence the result:

  • Scale is discretized in S levels.
  • The image is downsampled once for each octave (if you do this, the parameters x, y and sigma must be scaled too).

Definition at line 1379 of file sift.c.

References fast_expn(), normalize_histogram(), vl_abs_f(), VL_EPSILON_D, vl_floor_f(), VL_MAX, VL_MIN, vl_mod_2pi_f(), and VL_PI.

VL_EXPORT void vl_sift_delete ( VlSiftFilt f  ) 

Parameters:
f SIFT filter to delete.
The function frees the resources allocated by vl_sift_new().

Definition at line 565 of file sift.c.

References vl_free().

VL_EXPORT void vl_sift_detect ( VlSiftFilt f  ) 

The function detect keypoints in the current octave filling the internal keypoint buffer. Keypoints can be retrieved by vl_sift_get_keypoints().

Parameters:
f SIFT filter.

Index GSS

For internal use only.

Index matrix A

Definition at line 771 of file sift.c.

References _VlSiftFilt::keys, _VlSiftFilt::keys_res, _VlSiftFilt::nkeys, _VlSiftFilt::o_cur, _VlSiftFilt::S, _VlSiftFilt::sigma0, vl_abs_d(), vl_malloc(), vl_realloc(), and vl_sift_get_octave().

VL_INLINE double vl_sift_get_edge_thresh ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
threshold.

Definition at line 301 of file sift.h.

VL_INLINE VlSiftKeypoint const * vl_sift_get_keypoints ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
pointer to the keypoints list.

Definition at line 277 of file sift.h.

VL_INLINE double vl_sift_get_magnif ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
magnification factor.

Definition at line 325 of file sift.h.

VL_INLINE int vl_sift_get_nkeypoints ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
number of keypoints.

Definition at line 265 of file sift.h.

VL_INLINE int vl_sift_get_nlevels ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
number of leves per octave.

Definition at line 253 of file sift.h.

VL_INLINE int vl_sift_get_noctaves ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
number of octaves.

Definition at line 185 of file sift.h.

VL_INLINE double vl_sift_get_norm_thresh ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
threshold.

Definition at line 313 of file sift.h.

VL_INLINE vl_sift_pix * vl_sift_get_octave ( VlSiftFilt const *  f,
int  s 
)

Parameters:
f SIFT filter.
s level index.
The level index s ranges in the interval s_min = -1 and s_max = S + 2, where S is the number of levels per octave.

Returns:
pointer to the octave data for level s.

Definition at line 239 of file sift.h.

References _VlSiftFilt::octave, _VlSiftFilt::s_min, vl_sift_get_octave_height(), and vl_sift_get_octave_width().

Referenced by update_gradient(), vl_sift_detect(), vl_sift_process_first_octave(), and vl_sift_process_next_octave().

VL_INLINE int vl_sift_get_octave_first ( VlSiftFilt const *  f  ) 

-------------------------------------------------------------------

Parameters:
f SIFT filter.
Returns:
index of the first octave.

Definition at line 197 of file sift.h.

VL_INLINE int vl_sift_get_octave_height ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
current octave height.

Definition at line 221 of file sift.h.

Referenced by update_gradient(), vl_sift_get_octave(), and vl_sift_process_next_octave().

VL_INLINE int vl_sift_get_octave_index ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
index of the current octave.

Definition at line 173 of file sift.h.

VL_INLINE int vl_sift_get_octave_width ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
current octave width.

Definition at line 209 of file sift.h.

Referenced by update_gradient(), vl_sift_get_octave(), and vl_sift_process_next_octave().

VL_INLINE double vl_sift_get_peak_thresh ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
threshold ;

Definition at line 289 of file sift.h.

VL_EXPORT void vl_sift_keypoint_init ( VlSiftFilt const *  f,
VlSiftKeypoint k,
double  x,
double  y,
double  sigma 
)

Parameters:
f SIFT filter.
k SIFT keypoint (output).
x x coordinate of the center.
y y coordinate of the center.
sigma scale.
The function initializes the structure k from the location x and y and scale sigma of the keypoint.

Definition at line 1732 of file sift.c.

References _VlSiftFilt::O, _VlSiftFilt::o_min, _VlSiftFilt::S, _VlSiftFilt::s_max, _VlSiftFilt::s_min, _VlSiftKeypoint::sigma, _VlSiftFilt::sigma0, vl_floor_d(), VL_MAX, and VL_MIN.

VL_EXPORT VlSiftFilt* vl_sift_new ( int  width,
int  height,
int  noctaves,
int  nlevels,
int  o_min 
)

Parameters:
width image width.
height image height.
noctaves number of octaves.
nlevels number of levels per octave.
o_min first octave index.
The function allocates and returns a new SIFT filter for the specified image and scale space geometry.

Setting O to a negative value sets the number of octaves to the maximum possible value depending on the size of the image.

Returns:
the new SIFT filter.
See also:
vl_sift_delete().

Definition at line 493 of file sift.c.

References fast_expn_init(), _VlSiftFilt::s_max, _VlSiftFilt::s_min, _VlSiftFilt::sigma0, _VlSiftFilt::sigmak, vl_malloc(), VL_MAX, VL_MIN, and VL_SHIFT_LEFT.

VL_EXPORT int vl_sift_process_first_octave ( VlSiftFilt f,
vl_sift_pix const *  im 
)

Parameters:
f SIFT filter.
im image data.
The function starts processing a new image by computing its Gaussian scale space at the lower octave. It also empties the internal keypoint buffer.

Returns:
error code. The function returns VL_ERR_EOF if there are no more octaves to process.
See also:
vl_sift_process_next_octave().

Definition at line 595 of file sift.c.

References copy_and_downsample(), copy_and_upsample_rows(), _VlSiftFilt::height, _VlSiftFilt::nkeys, _VlSiftFilt::O, _VlSiftFilt::o_cur, VL_ERR_EOF, VL_ERR_OK, vl_imsmooth_f(), VL_SHIFT_LEFT, vl_sift_get_octave(), and _VlSiftFilt::width.

VL_EXPORT int vl_sift_process_next_octave ( VlSiftFilt f  ) 

Parameters:
f SIFT filter.
The function computes the next octave of the Gaussian scale space. Notice that this clears the record of any feature detected in the previous octave.

Returns:
error code. The function returns the error VL_ERR_EOF when there are no more octaves to process.
See also:
vl_sift_process_first_octave().

Definition at line 701 of file sift.c.

References copy_and_downsample(), _VlSiftFilt::height, _VlSiftFilt::o_cur, VL_ERR_EOF, VL_ERR_OK, vl_imsmooth_f(), VL_MIN, VL_SHIFT_LEFT, vl_sift_get_octave(), vl_sift_get_octave_height(), vl_sift_get_octave_width(), and _VlSiftFilt::width.

VL_INLINE void vl_sift_set_edge_thresh ( VlSiftFilt f,
double  t 
)

Parameters:
f SIFT filter.
t threshold.

Definition at line 350 of file sift.h.

VL_INLINE void vl_sift_set_magnif ( VlSiftFilt f,
double  m 
)

Parameters:
f SIFT filter.
m magnification factor.

Definition at line 374 of file sift.h.

VL_INLINE void vl_sift_set_norm_thresh ( VlSiftFilt f,
double  t 
)

Parameters:
f SIFT filter.
t threshold.

Definition at line 362 of file sift.h.

VL_INLINE void vl_sift_set_peak_thresh ( VlSiftFilt f,
double  t 
)

Parameters:
f SIFT filter.
t threshold.

Definition at line 338 of file sift.h.