sift.h File Reference
Detailed Description
- Credits:
- May people have contributed with suggestions and bug reports. Although the following list is certainly incomplete, we would like to thank: Brian Fulkerson, Wei Dong, Loic, Giuseppe, Liu, Erwin, P. Ivanov, and Q. S. Luo.
Scale Invariant Feature Transform
This library module implements a SIFT filter object, a reusable object to extract SIFT features from one or multiple images of the same size.
Overview
A SIFT feature is a selected image region (also called keypoint) with an associated descriptor. Keypoints are extracted by the SIFT detector and their descriptors are computed by the SIFT descriptor. It is also common to use independently the SIFT detector (i.e. computing the keypoints without descriptors) or the SIFT descriptor (i.e. computing descriptors of custom keypoints).SIFT detector
A SIFT keypoint is a cricular image region with an orientation. It is described by a geometric frame of four parameters: the keypoint center coordinates x and y, its scale (the radius of the region), and its orientation (an angle expresed in radians). The SIFT detector uses as keypoints image structures which resemble “blobs”. By searching for blobs at multiple scales and positions, the SIFT detector is invariant (or, more accurately, covaraint) to translation, rotations, and rescaling of the image.The keypoint orientation is also determined from the local image appearance and is covariant to image rotations. Depending on the symmetry of the keypoint appearance, determining the orientation can be ambiguous. In this case, the SIFT detectors returns a list of up to four possible orientations, constructing up to four frames (differing only by their orientation) for each detected image blob.

SIFT keypoints are circular image regions with an orientation.
- Number of octaves. Increasing the scale by an octave means doubling the size of the smoothing kernel, whose effect is roughly equivalent to halving the image resolution. By default, the scale space spans as many octaves as possible (i.e. roughy
log2(min(width,height)
), which has the effect of searching keypoints of all possible sizes. - First octave index. By convention, the octave of index 0 starts with the image full resolution. Specifying an index greater than 0 starts the scale space at a lower resolution (e.g. 1 halves the resolution). Similarly, specifying a negative index starts the scale space at an higher resoltuon image, and can be useful to extract very small features (since this is obtained by interpolating the input image, it does not make much sense to go past -1).
- Number of levels per octave. Each octave is sampled at this given numer of intermediate scales (by default 3). Increasing this number might in principle return more refined keypoints, but in practice can make their selection unstable due to noise (see [1]).
Keypoints are further refined by eliminating those that are likely to be unstable, either because they are selected nearby an image edge, rather than an image blob, or are found on image structures with low contrast. Filtering is controlled by the follow:
- Peak threshold. This is the minimum amount of contrast to accept a keypoint. It is set by configuring the SIFT filter object by vl_sift_set_peak_thresh().
- Edge threhsold. This is the edge rejection threhsold. It is set by configuring the SIFT filter object by vl_sift_set_edge_thresh().
Parameter | See also | Controlled by | Comment |
number of octaves | SIFT detector | vl_sift_new | |
first octave index | SIFT detector | vl_sift_new | set to -1 to extract very small features |
number of scale levels per octave | SIFT detector | vl_sift_new | can affect the number of extracted keypoints |
edge threhsold | SIFT detector | vl_sift_set_edge_thresh | decrease to eliminate more keypoints |
peak threhsold | SIFT detector | vl_sift_set_peak_thresh | increase to eliminate more keypoints |
SIFT Descriptor
- See also:
- Descriptor technical details

The SIFT descriptor is a spatial histogram of the image gradient.
- magnification factor. The desciriptor size is determined by multiplying the keypoint scale by this factor. It is set by vl_sift_set_magnif.
- Gaussian window size. The desriptor support is determined by a Gaussian window, which discounts gradient contributions farther away from the descriptor center. The standard deviation of this window is set by vl_sift_set_window_size and expressed in unit of bins.
VLFeat SIFT descriptor uses the following convention. The y axis points downards and angles are measured clockwise (to be consistent with the standard image convention). The 3-D histogram (consisting of bins) is stacked as a single 128-dimensional vector, where the fastest varying dimension is the orientation and the slowest the y spatial coordinate. This is illustrated by the following figure.

VLFeat conventions
- Note:
- Keypoints (frames) D. Lowe's SIFT implementation convention is slightly different: The y axis points upwards and the angles are measured counter-clockwise.

D. Lowes' SIFT implementation conventions
Parameter | See also | Controlled by | Comment |
magnification factor | SIFT Descriptor | vl_sift_set_magnif | increase this value to enlarge the image region described |
Gaussian window size | SIFT Descriptor | vl_sift_set_window_size | smaller values let the center of the descriptor count more |
Extensions
Eliminating low-contrast descriptors. Near-uniform patches do not yield stable keypoints or descriptors. vl_sift_set_norm_thresh() can be used to set a threshold on the average norm of the local gradient to zero-out descriptors that correspond to very low contrast regions. By default, the threshold is equal to zero, which means that no descriptor is zeroed. Normally this option is useful only with custom keypoints, as detected keypoints are implicitly eseleted at high contrast image regions.Using the SIFT filter object
The code provided in this module can be used in different ways. You can instantiate and use a SIFT filter to extract both SIFT keypoints and descriptors from one or multiple images. Alternatively, you can use one of the low level functions to run only a part of the SIFT algorithm (for instance, to compute the SIFT descriptors of custom keypoints).To use a SIFT filter object:
- Initialize a SIFT filter object with vl_sift_new(). The filter can be reused for multiple images of the same size (e.g. for an entiere video sequence).
- For each octave in the scale space:
- Compute the next octave of the DOG scale space using either vl_sift_process_first_octave() or vl_sift_process_next_octave() (stop processing if VL_ERR_EOF is returned).
- Run the SIFT detector with vl_sift_detect() to get the keypoints.
- For each keypoint:
- Use vl_sift_calc_keypoint_orientations() to get the keypoint orientation(s).
- For each orientation:
- Use vl_sift_calc_keypoint_descriptor() to get the keypoint descriptor.
- Delete the SIFT filter by vl_sift_delete().
To compute SIFT descriptors of custom keypoints, use vl_sift_calc_raw_descriptor().
Technical details
Scale space
In order to search for image blobs at multiple scale, the SIFT detector construct a scale space, defined as follows. Let
The Gaussian scale space is the collection of smoothed images
The image at infinite resolution is useful conceptually, but is not available to us; intstead, the input image
is assumed to be pre-smoothed at a nomimal level
to account for the finite resolution of the pixels. Thus in practice the scale space is computed by
Scales are sampled at logarithmic steps given by
being is the base scale,
is the first octave index, O the number of octaves and S the number of scales per octave.
Blobs are detected as local extrema of the Difference of Gaussians (DoG) scale space, obtained by subtracting successive scales of the Gaussian scale space:
At each next octave, the resoltuion of the images is halved to save computations. The images composing the Gaussian and DoG scale space can then be arranged as in the following figure:

GSS and DoG scale space structures.

This simplifies gluging togheter different octaves and extracting DoG maxima (required by the SIFT detector).
Detector
The SIFT frames (keypoints) are extracted based on local extrema (peaks) of the DoG scale space. Numerically, local extrema are elements whose
Eliminating low contrast responses
Peaks which are too short may have been generated by noise and are discarded. This is done by comparing the absolute value of the DoG scale space at the peak with the peak threshold
Eliminating edge responses
Peaks which are too flat are often generated by edges and do not yield stable features. These peaks are detected and removed as follows. Given a peak

This score has a minimum (equal to 4) when both eigenvalues of the Jacobian are equal (curved peak) and increases as one of the eigenvalues grows and the other stays small. Peaks are retained if the score is below the quantity , where
is the edge threshold. Notice that this quantity has a minimum equal to 4 when
and grows thereafter. Therefore the range of the edge threshold is
.
Orientation assignment
A peak in the DoG scale space fixes 2 parameters of the keypoint: the position and scale. It remains to choose an orientation. In order to do this, SIFT computes an histogram of the gradient orientations in a Gaussian window with a standard deviation which is 1.5 times bigger than the scale

This histogram is then smoothed and the maximum is selected. In addition to the biggest mode, up to other three modes whose amplitude is within the 80% of the biggest mode are retained and returned as additional orientations.
Descriptor
A SIFT descriptor of a local region (keypoint) is a 3-D spatial histogram of the image gradients. The gradient at each pixel is regarded as a sample of a three-dimensional elementary feature vector, formed by the pixel location and the gradient orientation. Samples are weighed by the gradient norm and accumulated in a 3-D histogram h, which (up to normalization and clamping) forms the SIFT descriptor of the region. An additional Gaussian weighting function is applied to give less importance to gradients farther away from the keypoint center.Construction in the canonical frame
Denote the gradient vector field computed at the scale
The descriptor is a 3-D spatial histogram capturing the distribution of . It is convenient to describe its construction in the canonical frame. In this frame, the image and descriptor axes coincide and each spatial bin has side 1. The histogram has
bins (usually
), as in the following figure:

Canonical SIFT descriptor and spatial binning functions
Bins are indexed by a triplet of indexes t, i, j and their centers are given by
The histogram is computed by using trilinear interpolation, i.e. by weighing contributions by the binning functions
The gradient vector field is transformed in a three-dimensional density map of weighed contribtions
The historam is localized in the keypoint support by a Gaussian window of standard deviation . The histogram is then given by
In post processing, the histogram is normalized, then clamped at 0.2, and
normalized again.
Calculation in the image frame
Invariance to similarity transformation is attained by atttaching descriptors to SIFT keypoints (or other similarity-covariant frames). Then projecting the image in the canonical descriptor frames has the effect of undoing the image deformation.
In practice, however, it is convenient to compute the descriptor directly in the image frame. To do this, denote with a hat quantities relative to the canonical frame and without a hat quantities relative to the image frame (so for instance is the x-coordinate in the canonical frame and
the x-coordinate in the image frame). Assume that canonical and image frame are related by an affinity:

Then all quantites can be computed in the image frame directly. For instance, the image at infinite resolution in the two frames are related by
The canonized image at scale is in relation with the scaled image
where by generalizing the previous definitions we have
Deriving shows that the gradient fileds are in relation
So we can compute the descriptor either in the image or canonical frame as:
where we defined the product of the two spatial binning functions
In the actual implementation, this integral is computed by visiting a rectangular area of the image that fully contains the keypoint grid (along with half a bin border to fully include the bin windowing function). Since the descriptor can be rotated, this area is a rectangle of sides (see also the illustration).
Standard SIFT descriptor
For a SIFT-detected keypoint of center



where is a counter-clockwise rotation of
radians, and m is the descriptor magnification factor which expresses how much larger the descriptor window is compared to the scale of the keypoint (a common value is m = 3). Moreover, the standard SIFT descriptor computes the image gradient at the scale of the keypoints, which in the canonical frame is equivalent to a smoothing of
. Finally, the Gaussian window is set to have standard deviation
. This yields the formula
Definition in file sift.h.
#include <stdio.h>
#include "generic.h"
Go to the source code of this file.
Data Structures | |
struct | _VlSiftKeypoint |
SIFT filter keypoint. More... | |
struct | _VlSiftFilt |
SIFT filter. More... | |
Typedefs | |
typedef float | vl_sift_pix |
SIFT filter pixel type. | |
Functions | |
Create and destroy | |
VL_EXPORT VlSiftFilt * | vl_sift_new (int width, int height, int noctaves, int nlevels, int o_min) |
Create a new SIFT filter. | |
VL_EXPORT void | vl_sift_delete (VlSiftFilt *f) |
Delete SIFT filter. | |
Process data | |
VL_EXPORT int | vl_sift_process_first_octave (VlSiftFilt *f, vl_sift_pix const *im) |
Start processing a new image. | |
VL_EXPORT int | vl_sift_process_next_octave (VlSiftFilt *f) |
Process next octave. | |
VL_EXPORT void | vl_sift_detect (VlSiftFilt *f) |
Detect keypoints. | |
VL_EXPORT int | vl_sift_calc_keypoint_orientations (VlSiftFilt *f, double angles[4], VlSiftKeypoint const *k) |
Calculate the keypoint orientation(s). | |
VL_EXPORT void | vl_sift_calc_keypoint_descriptor (VlSiftFilt *f, vl_sift_pix *descr, VlSiftKeypoint const *k, double angle) |
Compute the descriptor of a keypoint. | |
VL_EXPORT void | vl_sift_calc_raw_descriptor (VlSiftFilt const *f, vl_sift_pix const *image, vl_sift_pix *descr, int widht, int height, double x, double y, double s, double angle0) |
Run the SIFT descriptor on raw data. | |
VL_EXPORT void | vl_sift_keypoint_init (VlSiftFilt const *f, VlSiftKeypoint *k, double x, double y, double sigma) |
Initialize a keypoint from its position and scale. | |
Retrieve data and parameters | |
VL_INLINE int | vl_sift_get_octave_index (VlSiftFilt const *f) |
Get current octave index. | |
VL_INLINE int | vl_sift_get_noctaves (VlSiftFilt const *f) |
Get number of octaves. | |
VL_INLINE int | vl_sift_get_octave_first (VlSiftFilt const *f) |
Get first octave. | |
VL_INLINE int | vl_sift_get_octave_width (VlSiftFilt const *f) |
Get current octave width. | |
VL_INLINE int | vl_sift_get_octave_height (VlSiftFilt const *f) |
Get current octave height. | |
VL_INLINE int | vl_sift_get_nlevels (VlSiftFilt const *f) |
Get number of levels per octave. | |
VL_INLINE int | vl_sift_get_nkeypoints (VlSiftFilt const *f) |
Get number of keypoints. | |
VL_INLINE double | vl_sift_get_peak_thresh (VlSiftFilt const *f) |
Get peaks treashold. | |
VL_INLINE double | vl_sift_get_edge_thresh (VlSiftFilt const *f) |
Get edges threshold. | |
VL_INLINE double | vl_sift_get_norm_thresh (VlSiftFilt const *f) |
Get norm threshold. | |
VL_INLINE double | vl_sift_get_magnif (VlSiftFilt const *f) |
Get the magnification factor. | |
VL_INLINE double | vl_sift_get_window_size (VlSiftFilt const *f) |
Get the Gaussian window size. | |
VL_INLINE vl_sift_pix * | vl_sift_get_octave (VlSiftFilt const *f, int s) |
Get current octave data. | |
VL_INLINE VlSiftKeypoint const * | vl_sift_get_keypoints (VlSiftFilt const *f) |
Get keypoints. | |
Set parameters | |
VL_INLINE void | vl_sift_set_peak_thresh (VlSiftFilt *f, double t) |
Set peaks threshold. | |
VL_INLINE void | vl_sift_set_edge_thresh (VlSiftFilt *f, double t) |
Set edges threshold. | |
VL_INLINE void | vl_sift_set_norm_thresh (VlSiftFilt *f, double t) |
Set norm threshold. | |
VL_INLINE void | vl_sift_set_magnif (VlSiftFilt *f, double m) |
Set the magnification factor. | |
VL_INLINE void | vl_sift_set_window_size (VlSiftFilt *f, double x) |
Set the Gaussian window size. |
Function Documentation
VL_EXPORT void vl_sift_calc_keypoint_descriptor | ( | VlSiftFilt * | f, | |
vl_sift_pix * | descr, | |||
VlSiftKeypoint const * | k, | |||
double | angle0 | |||
) |
- Parameters:
-
f SIFT filter. descr SIFT descriptor (output) k keypoint. angle0 keypoint direction.
The function assumes that the keypoint is on the current octave. If not, it does not do anything.
Definition at line 1881 of file sift.c.
References fast_expn(), _VlSiftFilt::grad, normalize_histogram(), _VlSiftKeypoint::o, _VlSiftFilt::o_cur, _VlSiftFilt::s_max, _VlSiftFilt::s_min, update_gradient(), vl_abs_f(), vl_floor_f(), VL_MAX, VL_MIN, vl_mod_2pi_f(), VL_PI, and _VlSiftFilt::windowSize.
VL_EXPORT int vl_sift_calc_keypoint_orientations | ( | VlSiftFilt * | f, | |
double | angles[4], | |||
VlSiftKeypoint const * | k | |||
) |
- Parameters:
-
f SIFT filter. angles orientations (output). k keypoint.
- Remarks:
- The function requires the keypoint octave k->o to be equal to the filter current octave vl_sift_get_octave. If this is not the case, the function returns zero orientations.
The function requires the keypoint scale level
k->s
to be in the ranges_min+1
ands_max-2
(where usuallys_min=0
ands_max=S+2
). If this is not the case, the function returns zero orientations.
- Returns:
- number of orientations found.
Definition at line 1517 of file sift.c.
References fast_expn(), _VlSiftKeypoint::o, _VlSiftFilt::o_cur, _VlSiftFilt::s_max, _VlSiftFilt::s_min, update_gradient(), vl_floor_d(), VL_MAX, VL_MIN, and VL_PI.
VL_EXPORT void vl_sift_calc_raw_descriptor | ( | VlSiftFilt const * | f, | |
vl_sift_pix const * | grad, | |||
vl_sift_pix * | descr, | |||
int | width, | |||
int | height, | |||
double | x, | |||
double | y, | |||
double | sigma, | |||
double | angle0 | |||
) |
- Parameters:
-
f SIFT filter. grad image gradients. descr SIFT descriptor (output). width image width. height image height. x keypoint x coordinate. y keypoint y coordinate. sigma keypoint scale. angle0 keypoint orientation.

In order to be equivalent to a standard SIFT descriptor the image gradient must be computed at a smoothing level equal to the scale of the keypoint. In practice, the actual SIFT algorithm makes the following additional approximation, which influence the result:
- Scale is discretized in
S
levels. - The image is downsampled once for each octave (if you do this, the parameters x, y and sigma must be scaled too).
Definition at line 1712 of file sift.c.
References fast_expn(), normalize_histogram(), vl_abs_f(), VL_EPSILON_D, vl_floor_f(), VL_MAX, VL_MIN, vl_mod_2pi_f(), VL_PI, and _VlSiftFilt::windowSize.
VL_EXPORT void vl_sift_delete | ( | VlSiftFilt * | f | ) |
- Parameters:
-
f SIFT filter to delete.
Definition at line 905 of file sift.c.
References vl_free().
VL_EXPORT void vl_sift_detect | ( | VlSiftFilt * | f | ) |
The function detect keypoints in the current octave filling the internal keypoint buffer. Keypoints can be retrieved by vl_sift_get_keypoints().
- Parameters:
-
f SIFT filter.
Index GSS
For internal use only.
Index matrix A
Definition at line 1111 of file sift.c.
References _VlSiftFilt::keys, _VlSiftFilt::keys_res, _VlSiftFilt::nkeys, _VlSiftFilt::o_cur, _VlSiftFilt::S, _VlSiftFilt::sigma0, vl_abs_d(), vl_malloc(), vl_realloc(), and vl_sift_get_octave().
VL_INLINE double vl_sift_get_edge_thresh | ( | VlSiftFilt const * | f | ) |
VL_INLINE VlSiftKeypoint const * vl_sift_get_keypoints | ( | VlSiftFilt const * | f | ) |
VL_INLINE double vl_sift_get_magnif | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_nkeypoints | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_nlevels | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_noctaves | ( | VlSiftFilt const * | f | ) |
VL_INLINE double vl_sift_get_norm_thresh | ( | VlSiftFilt const * | f | ) |
VL_INLINE vl_sift_pix * vl_sift_get_octave | ( | VlSiftFilt const * | f, | |
int | s | |||
) |
- Parameters:
-
f SIFT filter. s level index.
s_min = -1
and s_max = S + 2
, where S
is the number of levels per octave.
- Returns:
- pointer to the octave data for level s.
Definition at line 244 of file sift.h.
References _VlSiftFilt::octave, _VlSiftFilt::s_min, vl_sift_get_octave_height(), and vl_sift_get_octave_width().
Referenced by update_gradient(), vl_sift_detect(), vl_sift_process_first_octave(), and vl_sift_process_next_octave().
VL_INLINE int vl_sift_get_octave_first | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_octave_height | ( | VlSiftFilt const * | f | ) |
- Parameters:
-
f SIFT filter.
- Returns:
- current octave height.
Definition at line 226 of file sift.h.
Referenced by update_gradient(), vl_sift_get_octave(), and vl_sift_process_next_octave().
VL_INLINE int vl_sift_get_octave_index | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_octave_width | ( | VlSiftFilt const * | f | ) |
- Parameters:
-
f SIFT filter.
- Returns:
- current octave width.
Definition at line 214 of file sift.h.
Referenced by update_gradient(), vl_sift_get_octave(), and vl_sift_process_next_octave().
VL_INLINE double vl_sift_get_peak_thresh | ( | VlSiftFilt const * | f | ) |
VL_INLINE double vl_sift_get_window_size | ( | VlSiftFilt const * | f | ) |
VL_EXPORT void vl_sift_keypoint_init | ( | VlSiftFilt const * | f, | |
VlSiftKeypoint * | k, | |||
double | x, | |||
double | y, | |||
double | sigma | |||
) |
- Parameters:
-
f SIFT filter. k SIFT keypoint (output). x x coordinate of the center. y y coordinate of the center. sigma scale.
Definition at line 2068 of file sift.c.
References _VlSiftFilt::O, _VlSiftFilt::o_min, _VlSiftFilt::S, _VlSiftFilt::s_max, _VlSiftFilt::s_min, _VlSiftKeypoint::sigma, _VlSiftFilt::sigma0, vl_floor_d(), VL_MAX, and VL_MIN.
VL_EXPORT VlSiftFilt* vl_sift_new | ( | int | width, | |
int | height, | |||
int | noctaves, | |||
int | nlevels, | |||
int | o_min | |||
) |
- Parameters:
-
width image width. height image height. noctaves number of octaves. nlevels number of levels per octave. o_min first octave index.
Setting O to a negative value sets the number of octaves to the maximum possible value depending on the size of the image.
- Returns:
- the new SIFT filter.
- See also:
- vl_sift_delete().
Definition at line 832 of file sift.c.
References fast_expn_init(), _VlSiftFilt::s_max, _VlSiftFilt::s_min, _VlSiftFilt::sigma0, _VlSiftFilt::sigmak, vl_malloc(), VL_MAX, VL_MIN, and VL_SHIFT_LEFT.
VL_EXPORT int vl_sift_process_first_octave | ( | VlSiftFilt * | f, | |
vl_sift_pix const * | im | |||
) |
- Parameters:
-
f SIFT filter. im image data.
- Returns:
- error code. The function returns VL_ERR_EOF if there are no more octaves to process.
- See also:
- vl_sift_process_next_octave().
Definition at line 935 of file sift.c.
References copy_and_downsample(), copy_and_upsample_rows(), _VlSiftFilt::height, _VlSiftFilt::nkeys, _VlSiftFilt::O, _VlSiftFilt::o_cur, VL_ERR_EOF, VL_ERR_OK, vl_imsmooth_f(), VL_SHIFT_LEFT, vl_sift_get_octave(), and _VlSiftFilt::width.
VL_EXPORT int vl_sift_process_next_octave | ( | VlSiftFilt * | f | ) |
- Parameters:
-
f SIFT filter.
- Returns:
- error code. The function returns the error VL_ERR_EOF when there are no more octaves to process.
- See also:
- vl_sift_process_first_octave().
Definition at line 1041 of file sift.c.
References copy_and_downsample(), _VlSiftFilt::height, _VlSiftFilt::o_cur, VL_ERR_EOF, VL_ERR_OK, vl_imsmooth_f(), VL_MIN, VL_SHIFT_LEFT, vl_sift_get_octave(), vl_sift_get_octave_height(), vl_sift_get_octave_width(), and _VlSiftFilt::width.
VL_INLINE void vl_sift_set_edge_thresh | ( | VlSiftFilt * | f, | |
double | t | |||
) |
VL_INLINE void vl_sift_set_magnif | ( | VlSiftFilt * | f, | |
double | m | |||
) |
VL_INLINE void vl_sift_set_norm_thresh | ( | VlSiftFilt * | f, | |
double | t | |||
) |
VL_INLINE void vl_sift_set_peak_thresh | ( | VlSiftFilt * | f, | |
double | t | |||
) |
VL_INLINE void vl_sift_set_window_size | ( | VlSiftFilt * | f, | |
double | x | |||
) |