Tutorials>Hierarchical integer k-means

Table of Contents

VLFeat offers a hierarchical version of integer k-means, which recursively applies vl_ikmeans to compute finer and finer partitions. For more details see Hierarchical Integer k-means API reference and the Integer k-means tutorial.

Usage

First, we generate some random data to cluster in [0,255]^2:

data     = uint8(rand(2,10000) * 255) ;
datat    = uint8(rand(2,100000)* 255) ;

To cluster this data, we simply use vl_hikmeans:

K        = 3 ;
nleaves  = 100 ;
[tree,A] = vl_hikmeans(data,K,nleaves) ;

Here nleaves is the desired number of leaf clusters. The algorithm terminates when there are at least nleaves nodes, creating a tree with depth = floor(log(K nleaves))

To assign labels to the new data, we use vl_hikmeanspush:

AT       = vl_hikmeanspush(tree,datat) ;
Hierarchical integer K-means. Left: A depiction of the recursive clusters. Each node is a cluster center. The root note is not depicted (its center would be the mean of the dataset). Right: Clusters are represented as different colors (here are more than 100 clusters, but only three colors are used).

Tree structure

The output tree is a MATLAB structure representing the tree of clusters:

> tree
tree =

          K: 3
      depth: 5
    centers: [2x3 int32]
        sub: [1x3 struct]

The field centers is the matrix of the cluster centers at the root node. If the depth of the tree is larger than 1, then the field sub is a structure array with one entry for each cluster. Each element is in turn a tree:

> tree.sub
ans =

1x3 struct array with fields:
    centers
    sub

with a field centers for its clusters and a field sub for its children. When there are no children, this field is equal to the empty matrix

> tree.sub(1).sub(1).sub(1).sub(1)

ans =

    centers: [2x3 int32]
        sub: []

Elkan

VLFeat supports two different implementations of k-means. While they produce identical output, the Elkan method is sometimes faster. The method parameters controls which method is used. Consider the case when K=10 and our data is now 128 dimensional (e.g. SIFT descriptors):

K=10;
nleaves = 1000;
data = uint8(rand(128,10000) * 255);
tic;
[tree,A] = vl_hikmeans(data,K,nleaves,'method', 'lloyd') ; % default
t_lloyd = toc
tic;
[tree,A] = vl_hikmeans(data,K,nleaves,'method', 'elkan') ;
t_elkan = toc

t_lloyd =

    8.0743

t_elkan =

    3.0427