**VLFeat** includes a basic implementation of k-means clustering
and hierarchical k-means clustering. They are designed to be
lightweight in order to work on large datasets. In particular, they
assume that the data are vectors of unsigned chars (one byte). While
this is limiting for some application, it works well for clustering
image descriptors, where very high precision is usually unnecessary.
For more details, see the
Integer k-means API
reference.

# Usage

Integer k-means (IKM) is run by the command
`vl_ikmeans`

. In order to demonstrate the usage of this
command, we sample 1000 random points in the `[0,255]^2`

integer square and use `vl_ikmeans`

to get k=3
clusters:

```
K = 3 ;
data = uint8(rand(2,1000) * 255) ;
[C,A] = vl_ikmeans(data,K) ;
```

The program returns both the cluster centers `C`

and the
data-to-cluster assignments `A`

. By means of the cluster
centers
`C`

we can project more data on the same clusters

```
datat = uint8(rand(2,10000) * 255) ;
AT = vl_ikmeanspush(datat,C) ;
```

In order to visualize the results, we associate to each cluster a color and we plot the points:

```
cl = get(gca,'ColorOrder') ;
ncl = size(cl,1) ;
for k=1:K
sel = find(A == k) ;
selt = find(AT == k) ;
plot(data(1,sel), data(2,sel), '.',...
'Color',cl(mod(k,ncl)+1,:)) ;
hold on ;
plot(datat(1,selt),datat(2,selt),'+',...
'Color',cl(mod(k,ncl)+1,:)) ;
end
```

# Elkan

VLFeat supports two different implementations of k-means. While
they produce identical output, the Elkan method requires fewer
distance computations. The `method`

parameters controls
which method is used. Consider the case when `K=100`

and our
data is now 128 dimensional (e.g. SIFT descriptors):

```
K=100;
data = uint8(rand(128,10000) * 255);
tic;
[C,A] = vl_ikmeans(data,K,'method', 'lloyd') ; % default
t_lloyd = toc
tic;
[C,A] = vl_ikmeans(data,K,'method', 'elkan') ;
t_elkan = toc
t_lloyd =
10.2884
t_elkan =
5.1405
```