Running the code consists of two main steps:

  1. Extracting the SFIT [3] features from the images of a given dataset. This is done by means of the sift++ program and a few GNU/Make scripts. To extract features as described in [2], we will also use the go_randkey.m MATLAB M-file, which generates random features.
  2. Running the classifier. This is done by means of the go_tree.m and go_classify M-Files.

The following are detailed instructions guiding you through this process.

Obtaining a dataset

First we download a dataset to experiment with. Here we use the well known (and relatively simple) Caltech-4. After downloading and decompressing the files (one for each category), move everything into a directory caltech-4 in order to have a hierarchy like

caltech-4/
caltech-4/airplanes_side/
caltech-4/cars_brad/
caltech-4/faces/
caltech-4/background/
caltech-4/cars_brad_bg/
caltech-4/motorbikes_side/

Each directory should contain several hundred images.

Extracting SIFT keypoints and descriptors

In order to extract SIFT keys, we use the GNU make scripts in the directory mk. To start with, open mk/caltech-4.mk:

# * caltech-4 *

P_IMAGES := $(HOME)/data/caltech-4
P_SIFTS  := $(HOME)/extra/caltech-4/std-sift
P_TMP    := /tmp/

SIFT     := $(HOME)/src/siftpp/sift
SIFTFLAGS:= --threshold 0 --verbose

include dataset.mk

Make the necessary adjustments: P_IMAGES is the path of the dataset; P_SIFTS is the path to the extracted features and P_TMP the path to a temporary directory used to store the intermediate PGM images. It is highly suggested to point the three paths to different directories. Also, use an empty directory for for P_SIFTS. SIFT is the path to the sift++ program executable. Once you are done, launch the script

> cd mk
> make -f caltech-4.mk

After a few hours, the directory P_SIFT should be plenty of descriptors.

For the next step, you need to configure the MATLAB programs to use the same paths that you used in the previous step. Open go_config.m, set which_exp to 1 (indicating that the first set of parameters is going to be sued) and then edit the appropriate fields in the relative section of the switch statement below. In particular, set the paths

pfx_images     = '~/data/caltech-4/' ;
pfx_sift       = '~/extra/caltech-4/std-sift/' ;
pfx_ktree      = '~/extra/caltech-4/std-tree' ;
pfx_classifier = '~/extra/caltech-4/std-class' ;

to the appropriate values.

Using random features

Quite suprising, much better performances can be obtained by selecting features randomly rather than by means of the SIFT detector [2]. In order to do this, we first need to generate a set of random keypoints by means of the M-file go_randkey.m and then run the appropriate GNU/Make script. In order to use go_randkey.m, first edit go_config.m: set which_exp to 2 and then edit the appropriate fields

pfx_images     = '~/data/caltech-4/' ;
pfx_sift       = '~/extra/caltech-4/rand-sift/' ;
pfx_key        = '~/extra/caltech-4/rand-key/' ;
pfx_tree       = '~/extra/caltech-4/rand-tree' ;
pfx_classifier = '~/extra/caltech-4/rand-class' ;

as you did before. Now you can run go_randkey which should scan the dataset and populate pfx_key with the random keypoints.

You still need to generate the descriptors. Make sure that the script mk/caltech-4-rand.mk correspnds to your choice:

# * caltech-4 with random keys *

P_IMAGE  := $(HOME)/data/caltech-4
P_KEY    := $(HOME)/extra/caltech-4/rand-key
P_SIFT   := $(HOME)/extra/caltech-4/rand-sift
P_TMP    := /tmp/

SIFT     := $(HOME)/src/siftpp/sift
SIFTFLAGS:= --threshold 0 --verbose

include dataset-with-keys.mk

Here P_KEY corresponds to pfx_key. Finally issue cd mk ; make -f caltech-4-rand.mk to compute the descriptors.

Building the dictionary and computing the signatures

The K-tree dictionary [1] is generated by means of the go_tree.m M-file. First edit go_config.m and setup parameters and paths. Then issues

> go_tree
> go_sign
> go_stat

The first command will scan the database, collect the features and run the hierarchical K-means algorithm. The results are then saved to the MAT file specified by the variable pfx_tree. The second command will use the tree to generate the signature of each image in the database [1]. The third command will generate a bunch of statistics about the dataset.

This picture shows the signatures for the images in the database (with random keys):

tree

This picture shows the pariwise distances among images in the database:

tree

Running the classifier

We are finally ready to classify our data. To do this run:

> go_classify

This script will use a simple nearest-neighbour voting technique to classify the images int the database. It uses the first class_ntrain images within each category as training images and classifies the rest.

Check out here the final results.