sift.h File Reference
Detailed Description
The Scale Invariant Feature Transform (SIFT) bundles a feature detector and a feature descriptor. This program implements a “SIFT filter”. This is a reusable object that can be used to extract SIFT features from multiple images of the same size.
The SIFT detector determines the geometry of a SIFT feature. Geometrically, the feature is an oriented disk (also called a feature frame or keypoint) and has a center , a scale
, and an orientation
. The SIFT detector works by identifying blob-like structures in an image and attaching oriented disks to them.
The SIFT descriptor describes compactly the appearance of the image region corresponding to a SIFT frame. The SIFT descriptor works by extracting a Histogram of Oriented Gradients (HOG), which is a statistic of the gradient orientations inside the image region.
Using the SIFT filter
The code provided in this module can be used in different ways. You can instantiate and use a SIFT filter to extract both SIFT keypoints and descriptors from one or multiple images. Alternatively, you can use a lower level function to run only a part of the SIFT algorithm (for instance, to compute the SIFT descriptors of custom keypoints).To use the SIFT filter:
- Initialize the SIFT filter with vl_sift_new(). The filter can be reused if the image size does not change.
- For each octave:
- Compute the DOG scale space using either vl_sift_process_first_octave() or vl_sift_process_next_octave() (stop if VL_ERR_EOF is returned).
- Run the SIFT detector with vl_sift_detect() to get the keypoints.
- For each keypoint:
- Use vl_sift_calc_keypoint_orientations() to get the keypoint orientation(s).
- For each orientation:
- Use vl_sift_calc_keypoint_descriptor() to get the keypoint descriptor.
- Delete the SIFT filter by vl_sift_delete().
To compute SIFT descriptors of custom keypoints, use vl_sift_calc_raw_descriptor().
The scale space
The SIFT detector searches for image blobs at multiple scales. In order to do this, it first computes a Gaussian pyramid by gradually smoothing the image and reducing its scale (resolution). Then, it looks for blobs at all possible locations and scales.
Scales are sampled by octaves and by sublevels within each octave. This is controlled by three parameters: the starting octave , the number of octaves O, and the number of subdivisions for each octave S. While O is usually set to its maximum value,
can be set to either 0 (native resolution), -1 (subpixel resolution), or a larger value (coarser resolution). The effect of the number of subdivision S is more subtle, and we recommend reading Lowe's original paper.
parameter | alt. name | default value | set by |
![]() | O | as big as possible | vl_sift_new() |
![]() | o_min | -1 | vl_sift_new() |
![]() | S | 3 | vl_sift_new() |
Scale space details
In addition to the Gaussian scale space, SIFT uses a Difference of Gaussians (DoG) scale space, obtained by subtracting successive scales of the Gaussian scale space. The ensemble of the smoothed images and their differences are organized as follows:

The black vertical segments represent images of the Gaussian Scale Space (GSS), arranged by increasing scale . The image at scale
is equal to the image at scale 1 smoothed with a Gaussian kernel of that variance.
The input image is assumed to be pre-smoothed at scale due to pixel aliasing (hence the image at scale 1 is not really available).
Scales are sampled at points logarithmically spaced. The levels are indexed by o (octave index) and s (level index). There are O octaves and S levels per octave. Images are downsampled at the octave boundaries; this is represented by the length of the black segments, which are proportional to the resolution of the corresponding image.
The octave index o starts at and ends at
. The level index starts at
and ends at
. From the picture is apparent that a few scale levels are represented twice, at two different resolutions. This is necessary to initialize the computation of the next octave and the computation of the DoG scale space.
The DOG scale space is obtained by subtracting contiguous GSS levels. The scale of a DOG level obtained in such a way can be thought as sitting in between the scales of the two images being subtracted. Pictorially, the DOG levels are represented as vertical blue segments and sit in between the smoothed images (black segments).
The SIFT detector extracts local extrema of the DoG scale space (in both x, y and directions). To compute such local extrema it is necessary to sample the DoG scale space in a 3x3x3 neighborhood. This means that local extrema cannot be extracted in correspondence of the first and last levels of an octave. This is the reason why we compute two redundant levels for each octave.
The detector
The SIFT frames (keypoints) are extracted based on peaks (local extrema) of the DoG scale space. Peaks are searched in a neighborhood of 3x3x3 samples (in space and scale). The previous figure shows the scale levels involved in this search (they are the ones at the intersection of two green arrows). Peaks are then quadratically interpolated. Finally, they are filtered and the orientation(s) is computed as explained in the next sections.Peak threshold
Peaks which are too short may have been generated by noise and are discarded. This is done by comparing the absolute value of the DoG scale space at the peak with the peak threshold
Edge threshold
Peaks which are too flat are often generated by edges and do not yield stable features. These peaks are detected and removed as follows. Given a peak


This score has a minimum (equal to 4) when both eigenvalues of the Jacobian are equal (curved peak) and increases as one of the eigenvalues grows and the other stays small. Peaks are retained if the score is below the quantity , where
is the edge threshold. Notice that this quantity has a minimum equal to 4 when
and grows thereafter. Therefore the range of the edge threshold is
.
Orientations
A peak in the DoG scale space fixes 2 parameters of the keypoint: the position and scale. It remains to choose an orientation. In order to do this, SIFT computes an histogram of the gradient orientations in a Gaussian window with a standard deviation which is 1.5 times bigger than the scale

This histogram is then smoothed and the maximum is selected. In addition to the biggest mode, up to other three modes whose amplitude is within the 80% of the biggest mode are retained and returned as additional orientations.
parameter | alt. name | default value | set by |
![]() | edge_thresh | 10.0 | vl_sift_set_edge_thresh() |
![]() | peak_thresh | 0 | vl_sift_set_peak_thresh() |
The descriptor
The SIFT descriptor is a three dimensional histogram



The SIFT descriptor is a 3-D array, but it is usually presented as a vector. This vector is obtained by stacking the 3-D array (being
the fastest varying index).
Bins are bilinearly interpolated and partially overlap. There are bins along the x and y directions and
along the
direction.
The spatial extent of the bins depends on the keypoint scale and a magnification factor m. A bin extends for
pixels (ignoring bilinear smoothing). In the picture, the small blue circle has radius proportional to
and m is equal to 3 (the standard value).
Since there are bins along each spatial direction the descriptor support extends for
pixels in each direction. Accounting for the extra half bin used by bilinear smoothing, the actual support extends for
pixels. The picture shows the arrangement of the bins for
.
When added to the histogram, gradients are weighted by their magnitude and by a Gaussian window centered at the keypoint. This Gaussian window has variance equal to . In the picture, the Gaussian window is represented by the larger blue circle.
- Remarks:
- In practice, the descriptor is computed by scanning a rectangular area that covers its support (scaled to match the resolution of the corresponding image in the GSS scale space). Since the descriptor can be rotated, this area has extension
(see also the picture). This remark has significance only for the implementation.
parameter | alt. name | default value |
![]() | BP | 4 |
![]() | BO | 8 |
![]() | magnif | 3 |
The keypoint coordinates (x,y) are expressed in the standard image convention (y axis pointing down). This also establishes the convention for expressing the angle th of a vector v (here v could be either the gradient of the image or the direction of the keypoint). To slightly complicate the matter, however, the index th of the descriptor h(th,x,y) follows the opposite convention (this is for compatibility with Lowe's original SIFT implementation), as shown by the figure:

Extensions
Norm threshold
Near-uniform patches do not yield stable features. By default, all descriptors will be computed, but when this option is set, descriptors who have a small norm before scaling will be set explicitly to zero.Descriptor that have norm below this value are set to the null vector. This is useful to remove low contrast patches. The norm of a descriptor is defined as the sum of the gradient magnitude accumulated into each of the bins.
Acknowledgments
- Thanks to Wei Dong for finding a memory leak.
- Thanks to Brian, Loic, Giuseppe, Liu, Erwin, P. Ivanov, Q.S. Luo for finding bugs in old versions of this program.
Definition in file sift.h.
#include <stdio.h>
#include "generic.h"
Go to the source code of this file.
Data Structures | |
struct | _VlSiftKeypoint |
SIFT filter keypoint. More... | |
struct | _VlSiftFilt |
SIFT filter. More... | |
Typedefs | |
typedef float | vl_sift_pix |
SIFT filter pixel type. | |
Functions | |
Create and destroy | |
VL_EXPORT VlSiftFilt * | vl_sift_new (int width, int height, int noctaves, int nlevels, int o_min) |
Create a new SIFT filter. | |
VL_EXPORT void | vl_sift_delete (VlSiftFilt *f) |
Delete SIFT filter. | |
Process data | |
VL_EXPORT int | vl_sift_process_first_octave (VlSiftFilt *f, vl_sift_pix const *im) |
Start processing a new image. | |
VL_EXPORT int | vl_sift_process_next_octave (VlSiftFilt *f) |
Process next octave. | |
VL_EXPORT void | vl_sift_detect (VlSiftFilt *f) |
Detect keypoints. | |
VL_EXPORT int | vl_sift_calc_keypoint_orientations (VlSiftFilt *f, double angles[4], VlSiftKeypoint const *k) |
Calculate the keypoint orientation(s). | |
VL_EXPORT void | vl_sift_calc_keypoint_descriptor (VlSiftFilt *f, vl_sift_pix *descr, VlSiftKeypoint const *k, double angle) |
Compute the descriptor of a keypoint. | |
VL_EXPORT void | vl_sift_calc_raw_descriptor (VlSiftFilt const *f, vl_sift_pix const *image, vl_sift_pix *descr, int widht, int height, double x, double y, double s, double angle0) |
Run the SIFT descriptor on raw data. | |
VL_EXPORT void | vl_sift_keypoint_init (VlSiftFilt const *f, VlSiftKeypoint *k, double x, double y, double sigma) |
Initialize a keypoint from its position and scale. | |
Retrieve data and parameters | |
VL_INLINE int | vl_sift_get_octave_index (VlSiftFilt const *f) |
Get current octave index. | |
VL_INLINE int | vl_sift_get_noctaves (VlSiftFilt const *f) |
Get number of octaves. | |
VL_INLINE int | vl_sift_get_octave_first (VlSiftFilt const *f) |
Get first octave. | |
VL_INLINE int | vl_sift_get_octave_width (VlSiftFilt const *f) |
Get current octave width. | |
VL_INLINE int | vl_sift_get_octave_height (VlSiftFilt const *f) |
Get current octave height. | |
VL_INLINE int | vl_sift_get_nlevels (VlSiftFilt const *f) |
Get number of levels per octave. | |
VL_INLINE int | vl_sift_get_nkeypoints (VlSiftFilt const *f) |
Get number of keypoints. | |
VL_INLINE double | vl_sift_get_peak_thresh (VlSiftFilt const *f) |
Get peaks treashold. | |
VL_INLINE double | vl_sift_get_edge_thresh (VlSiftFilt const *f) |
Get edges threshold. | |
VL_INLINE double | vl_sift_get_norm_thresh (VlSiftFilt const *f) |
Get norm threshold. | |
VL_INLINE double | vl_sift_get_magnif (VlSiftFilt const *f) |
Get the magnification factor. | |
VL_INLINE vl_sift_pix * | vl_sift_get_octave (VlSiftFilt const *f, int s) |
Get current octave data. | |
VL_INLINE VlSiftKeypoint const * | vl_sift_get_keypoints (VlSiftFilt const *f) |
Get keypoints. | |
Set parameters | |
VL_INLINE void | vl_sift_set_peak_thresh (VlSiftFilt *f, double t) |
Set peaks threshold. | |
VL_INLINE void | vl_sift_set_edge_thresh (VlSiftFilt *f, double t) |
Set edges threshold. | |
VL_INLINE void | vl_sift_set_norm_thresh (VlSiftFilt *f, double t) |
Set norm threshold. | |
VL_INLINE void | vl_sift_set_magnif (VlSiftFilt *f, double m) |
Set the magnification factor. |
Function Documentation
VL_EXPORT void vl_sift_calc_keypoint_descriptor | ( | VlSiftFilt * | f, | |
vl_sift_pix * | descr, | |||
VlSiftKeypoint const * | k, | |||
double | angle0 | |||
) |
- Parameters:
-
f SIFT filter. descr SIFT descriptor (output) k keypoint. angle0 keypoint direction.
The function assumes that the keypoint is on the current octave. If not, it does not do anything.
Definition at line 1543 of file sift.c.
References fast_expn(), _VlSiftFilt::grad, normalize_histogram(), _VlSiftKeypoint::o, _VlSiftFilt::o_cur, _VlSiftFilt::s_max, _VlSiftFilt::s_min, update_gradient(), vl_abs_f(), vl_floor_f(), VL_MAX, VL_MIN, vl_mod_2pi_f(), and VL_PI.
VL_EXPORT int vl_sift_calc_keypoint_orientations | ( | VlSiftFilt * | f, | |
double | angles[4], | |||
VlSiftKeypoint const * | k | |||
) |
- Parameters:
-
f SIFT filter. angles orientations (output). k keypoint.
- Remarks:
- The function requires the keypoint octave k->o to be equal to the filter current octave vl_sift_get_octave. If this is not the case, the function returns zero orientations.
The function requires the keypoint scale level
k->s
to be in the ranges_min+1
ands_max-2
(where usuallys_min=0
ands_max=S+2
). If this is not the case, the function returns zero orientations.
- Returns:
- number of orientations found.
Definition at line 1177 of file sift.c.
References fast_expn(), _VlSiftKeypoint::o, _VlSiftFilt::o_cur, _VlSiftFilt::s_max, _VlSiftFilt::s_min, update_gradient(), vl_floor_d(), VL_MAX, VL_MIN, and VL_PI.
VL_EXPORT void vl_sift_calc_raw_descriptor | ( | VlSiftFilt const * | f, | |
vl_sift_pix const * | grad, | |||
vl_sift_pix * | descr, | |||
int | width, | |||
int | height, | |||
double | x, | |||
double | y, | |||
double | sigma, | |||
double | angle0 | |||
) |
- Parameters:
-
f SIFT filter. grad image gradients. descr SIFT descriptor (output). width image width. height image height. x keypoint x coordinate. y keypoint y coordinate. sigma keypoint scale. angle0 keypoint orientation.

In order to be equivalent to a standard SIFT descriptor the image gradient must be computed at a smoothing level equal to the scale of the keypoint. In practice, the actual SIFT algorithm makes the following additional approximation, which influence the result:
- Scale is discretized in
S
levels. - The image is downsampled once for each octave (if you do this, the parameters x, y and sigma must be scaled too).
Definition at line 1379 of file sift.c.
References fast_expn(), normalize_histogram(), vl_abs_f(), VL_EPSILON_D, vl_floor_f(), VL_MAX, VL_MIN, vl_mod_2pi_f(), and VL_PI.
VL_EXPORT void vl_sift_delete | ( | VlSiftFilt * | f | ) |
- Parameters:
-
f SIFT filter to delete.
Definition at line 565 of file sift.c.
References vl_free().
VL_EXPORT void vl_sift_detect | ( | VlSiftFilt * | f | ) |
The function detect keypoints in the current octave filling the internal keypoint buffer. Keypoints can be retrieved by vl_sift_get_keypoints().
- Parameters:
-
f SIFT filter.
Index GSS
For internal use only.
Index matrix A
Definition at line 771 of file sift.c.
References _VlSiftFilt::keys, _VlSiftFilt::keys_res, _VlSiftFilt::nkeys, _VlSiftFilt::o_cur, _VlSiftFilt::S, _VlSiftFilt::sigma0, vl_abs_d(), vl_malloc(), vl_realloc(), and vl_sift_get_octave().
VL_INLINE double vl_sift_get_edge_thresh | ( | VlSiftFilt const * | f | ) |
VL_INLINE VlSiftKeypoint const * vl_sift_get_keypoints | ( | VlSiftFilt const * | f | ) |
VL_INLINE double vl_sift_get_magnif | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_nkeypoints | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_nlevels | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_noctaves | ( | VlSiftFilt const * | f | ) |
VL_INLINE double vl_sift_get_norm_thresh | ( | VlSiftFilt const * | f | ) |
VL_INLINE vl_sift_pix * vl_sift_get_octave | ( | VlSiftFilt const * | f, | |
int | s | |||
) |
- Parameters:
-
f SIFT filter. s level index.
s_min = -1
and s_max = S + 2
, where S
is the number of levels per octave.
- Returns:
- pointer to the octave data for level s.
Definition at line 239 of file sift.h.
References _VlSiftFilt::octave, _VlSiftFilt::s_min, vl_sift_get_octave_height(), and vl_sift_get_octave_width().
Referenced by update_gradient(), vl_sift_detect(), vl_sift_process_first_octave(), and vl_sift_process_next_octave().
VL_INLINE int vl_sift_get_octave_first | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_octave_height | ( | VlSiftFilt const * | f | ) |
- Parameters:
-
f SIFT filter.
- Returns:
- current octave height.
Definition at line 221 of file sift.h.
Referenced by update_gradient(), vl_sift_get_octave(), and vl_sift_process_next_octave().
VL_INLINE int vl_sift_get_octave_index | ( | VlSiftFilt const * | f | ) |
VL_INLINE int vl_sift_get_octave_width | ( | VlSiftFilt const * | f | ) |
- Parameters:
-
f SIFT filter.
- Returns:
- current octave width.
Definition at line 209 of file sift.h.
Referenced by update_gradient(), vl_sift_get_octave(), and vl_sift_process_next_octave().
VL_INLINE double vl_sift_get_peak_thresh | ( | VlSiftFilt const * | f | ) |
VL_EXPORT void vl_sift_keypoint_init | ( | VlSiftFilt const * | f, | |
VlSiftKeypoint * | k, | |||
double | x, | |||
double | y, | |||
double | sigma | |||
) |
- Parameters:
-
f SIFT filter. k SIFT keypoint (output). x x coordinate of the center. y y coordinate of the center. sigma scale.
Definition at line 1732 of file sift.c.
References _VlSiftFilt::O, _VlSiftFilt::o_min, _VlSiftFilt::S, _VlSiftFilt::s_max, _VlSiftFilt::s_min, _VlSiftKeypoint::sigma, _VlSiftFilt::sigma0, vl_floor_d(), VL_MAX, and VL_MIN.
VL_EXPORT VlSiftFilt* vl_sift_new | ( | int | width, | |
int | height, | |||
int | noctaves, | |||
int | nlevels, | |||
int | o_min | |||
) |
- Parameters:
-
width image width. height image height. noctaves number of octaves. nlevels number of levels per octave. o_min first octave index.
Setting O to a negative value sets the number of octaves to the maximum possible value depending on the size of the image.
- Returns:
- the new SIFT filter.
- See also:
- vl_sift_delete().
Definition at line 493 of file sift.c.
References fast_expn_init(), _VlSiftFilt::s_max, _VlSiftFilt::s_min, _VlSiftFilt::sigma0, _VlSiftFilt::sigmak, vl_malloc(), VL_MAX, VL_MIN, and VL_SHIFT_LEFT.
VL_EXPORT int vl_sift_process_first_octave | ( | VlSiftFilt * | f, | |
vl_sift_pix const * | im | |||
) |
- Parameters:
-
f SIFT filter. im image data.
- Returns:
- error code. The function returns VL_ERR_EOF if there are no more octaves to process.
- See also:
- vl_sift_process_next_octave().
Definition at line 595 of file sift.c.
References copy_and_downsample(), copy_and_upsample_rows(), _VlSiftFilt::height, _VlSiftFilt::nkeys, _VlSiftFilt::O, _VlSiftFilt::o_cur, VL_ERR_EOF, VL_ERR_OK, vl_imsmooth_f(), VL_SHIFT_LEFT, vl_sift_get_octave(), and _VlSiftFilt::width.
VL_EXPORT int vl_sift_process_next_octave | ( | VlSiftFilt * | f | ) |
- Parameters:
-
f SIFT filter.
- Returns:
- error code. The function returns the error VL_ERR_EOF when there are no more octaves to process.
- See also:
- vl_sift_process_first_octave().
Definition at line 701 of file sift.c.
References copy_and_downsample(), _VlSiftFilt::height, _VlSiftFilt::o_cur, VL_ERR_EOF, VL_ERR_OK, vl_imsmooth_f(), VL_MIN, VL_SHIFT_LEFT, vl_sift_get_octave(), vl_sift_get_octave_height(), vl_sift_get_octave_width(), and _VlSiftFilt::width.
VL_INLINE void vl_sift_set_edge_thresh | ( | VlSiftFilt * | f, | |
double | t | |||
) |
VL_INLINE void vl_sift_set_magnif | ( | VlSiftFilt * | f, | |
double | m | |||
) |
VL_INLINE void vl_sift_set_norm_thresh | ( | VlSiftFilt * | f, | |
double | t | |||
) |
VL_INLINE void vl_sift_set_peak_thresh | ( | VlSiftFilt * | f, | |
double | t | |||
) |