The Scale-Invariant Feature Transform (SIFT) bundles a feature detector and a feature descriptor. The detector extracts from an image a number of frames (attributed regions) in a way which is consistent with (some) variations of the illumination, viewpoint and other viewing conditions. The descriptor associates to the regions a signature which identifies their appearance compactly and robustly.
Extracting frames and descriptors
Both the detector and descriptor are accessible by
the vl_sift
MATLAB command (there is a similar command line
utility). Open MATLAB and load a test image
pfx = fullfile(vl_root,'data','a.jpg') ; I = imread(pfx) ; image(I) ;

The vl_sift
command requires an image in gray scale
format and single precision. It also expects the range to be normalized
in the [0,255] interval (while this is not strictly required, the
default values of some internal thresholds are tuned for this
case). The image I
is converted in the appropriate format
by
I = float(rgb2gray(I)) ;
We compute the SIFT frames (keypoints) and descriptors by
[f,d] = vl_sift(I) ;
The matrix f
has a column for each frame. A frame is a
disk of center f(1:2)
, scale f(3)
and
orientation f(4)
. We visualize a random selection of 50
features by:
perm = randperm(size(f,2)) ; sel = perm(1:50) ; h1 = vl_plotframe(f(:,sel)) ; h2 = vl_plotframe(f(:,sel)) ; set(h1,'color','y','linewidth',3) ; set(h2,'color','k','linewidth',1) ;

We can also overlay the descriptors by
h3 = vl_plotsiftdescriptor(d(:,sel),f(:,sel)) ; set(h3,'color','g') ;

Detector parameters
The SIFT detector is controlled mainly by two parameters: the peak threshold and the (non) edge threshold.
The peak threshold filters peaks of the DoG scale space that are too small (in absolute value). For instance, consider the test image obtained as a gradient of Gaussian blobs:
I = double(rand(100,500) <= .005) ; I = (ones(100,1) * linspace(0,1,500)) .* I ; I(:,1) = 0 ; I(:,end) = 0 ; I(1,:) = 0 ; I(end,:) = 0 ; I = 2*pi*4^2 * vl_imsmooth(I,4) I = single(255 * I) ;

We run the detector with peak threshold erb$x$ by
f = vl_sift(I, 'PeakThresh', x) ;
obtaining less and less features




Similarly, the edge threshold instead eliminates peaks of the DoG scale space whose curvature is too small (the reason is that such peaks yields badly localized frames). For instance, consider the test image
I = zeros(100,500) ; for i=[10 20 30 40 50 60 70 80 90] I(50-round(i/3):50+round(i/3),i*5) = 1 ; end I = 2*pi*8^2 * vl_imsmooth(I,8) ; I = single(255 * I) ;

We run the detector with edge threshold x
by
f = vl_sift(I, 'EdgeThresh', x) ;
obtaining more and more features:




Custom frames
The MATLAB command For instance, we can compute the descriptor of a SIFT frame centered
at position Multiple frames Notice that, depending on the local appearance, a keypoint may
have multiple orientations. Moreover, a keypoint computed on
a constant image region (such as one big as one pixel) has no
orientations! In our implementation SIFT frames are expressed in the standard
image reference. The only difference between the command line and
MATLAB drivers is that the latter assumes that the image origin
(top-left corner) has coordinate (1,1) as opposed to (0,0). Lowe's
original implementation uses a different reference system, illustrated
next: Our implementation uses the standard image reference system, with
the
By comparison, D. Lowe's implementation (see bottom half of the
figure) uses a slightly different convention: Frames centers are
expressed relatively to the standard image reference system, but the
frames orientation and the descriptor assume that the $y$ axis points
upward. Consequently, to map from our to D. Lowe's convention, frames
orientations need to be negated and the descriptor elements must be
re-arranged.sift
(and the command line utility)
can bypass the detector and run the descriptor on custom frames by
means of the
(100,100)
, of scale 10 and orientation
pi/8
by
fc = [100;100;10;pi/8] ;
[f,d] = vl_sift(I,'frames',fc) ;
fc
an be specified as well. In this
case they are re-ordered by increasing
scale. Th Orientations
option instructs the program to
use the custom position and scale but to compute the keypoint
orientations, as in
fc = [100;100;10;0] ;
[f,d] = vl_sift(I,'frames',fc,'orientations') ;
Conventions
y
axis pointing downward. The frame
orientation θ
and descriptor use the same reference
system (i.e. a small positive rotation of the x
moves it
towards the y
axis). Recall that each descriptor element
is a bin indexed by (θ,x,y)
; the histogram is
vectorized in such a way that θ
is the fastest
varying index and y
the slowest.