Tutorials>Integer K-means

Table of Contents

VLFeat includes a basic implementation of k-means clustering and hierarchical k-means clustering. They are designed to be lightweight in order to work on large datasets. In particular, they assume that the data are vectors of unsigned chars (one byte). While this is limiting for some application, it works well for clustering image descriptors, where very high precision is usually unnecessary. For more details, see the Integer k-means API reference.

Usage

Integer k-means (IKM) is run by the command vl_ikmeans. In order to demonstrate the usage of this command, we sample 1000 random points in the [0,255]^2 integer square and use vl_ikmeans to get k=3 clusters:

K = 3 ;
data = uint8(rand(2,1000) * 255) ;
[C,A] = vl_ikmeans(data,K) ;

The program returns both the cluster centers C and the data-to-cluster assignments A. By means of the cluster centers C we can project more data on the same clusters

datat = uint8(rand(2,10000) * 255) ;
AT = vl_ikmeanspush(datat,C) ;

In order to visualize the results, we associate to each cluster a color and we plot the points:

cl = get(gca,'ColorOrder') ;
ncl = size(cl,1) ;
for k=1:K
  sel  = find(A  == k) ;
  selt = find(AT == k) ;
  plot(data(1,sel),  data(2,sel),  '.',...
       'Color',cl(mod(k,ncl)+1,:)) ;
  hold on ;
  plot(datat(1,selt),datat(2,selt),'+',...
       'Color',cl(mod(k,ncl)+1,:)) ;
end
Integer k-means. We show clusters of 2-D points obtained by integer k-means. There are k=3 clusters represented with different colors. The clusters have been estimated from 1000 points (displayed as dots). Then 10000 different points have been projected on the same clusters (displayed as crosses). The three big markers represent the cluster centers.

Elkan

VLFeat supports two different implementations of k-means. While they produce identical output, the Elkan method requires fewer distance computations. The method parameters controls which method is used. Consider the case when K=100 and our data is now 128 dimensional (e.g. SIFT descriptors):

K=100;
data = uint8(rand(128,10000) * 255);
tic;
[C,A] = vl_ikmeans(data,K,'method', 'lloyd') ; % default
t_lloyd = toc
tic;
[C,A] = vl_ikmeans(data,K,'method', 'elkan') ;
t_elkan = toc

t_lloyd =

   10.2884

t_elkan =

    5.1405