ikmeans.h File Reference
Detailed Description
Integer K-means (IKM) is an implementation of K-means clustering (or vector quantization, VQ) for integer data. This is particularly useful for clustering large collections of visual descriptors.
Use the function vl_ikm_new() to create a IKM quantizer. Initialize the IKM quantizer with K
clusters by vl_ikm_init() or similar function. Use vl_ikm_train() to train the quantizer. Use vl_ikm_push() or vl_ikm_push_one() to quantize new data.
Given data and a number of clusters
, the goal is to find assignments
and centers
so that the expected distortion
is minimized. Here is the distortion, i.e. the cost we pay for representing
by
. IKM uses the squared distortion
.
Algorithms
Initialization
Most K-means algorithms are iterative and needs an initialization in the form of an initial choice of the centers
- User specified centers (vl_ikm_init);
- Random centers (vl_ikm_init_rand);
- Centers from
K
randomly selected data points (vl_ikm_init_rand_data).
Lloyd
The Lloyd (also known as Lloyd-Max and LBG) algorithm iteratively:
- Fixes the centers, optimizing the assignments (minimizing by exhaustive search the association of each data point to the centers);
- Fixes the assignments and optimizes the centers (by descending the distortion error function). For the squared distortion, this step is in closed form.
This algorithm is not particularly efficient because all data points need to be compared to all centers, for a complexity , where T is the total number of iterations.
Elkan
The Elkan algorithm is an optimized variant of Lloyd. By making use of the triangle inequality, many comparisons of data points and centers are avoided, especially at later iterations. Usually 4-5 times less comparisons than Lloyd are preformed, providing a dramatic speedup in the execution time.
Definition in file ikmeans.h.
#include "generic.h"
Go to the source code of this file.
Data Structures | |
struct | _VlIKMFilt |
IKM quantizer. More... | |
Typedefs | |
typedef vl_int32 | vl_ikm_acc |
Enumerations | |
enum | VlIKMAlgorithms { VL_IKM_LLOYD, VL_IKM_ELKAN } |
IKM algorithms. More... | |
Functions | |
Create and destroy | |
VL_EXPORT VlIKMFilt * | vl_ikm_new (int method) |
Create a new IKM quantizer. | |
VL_EXPORT void | vl_ikm_delete (VlIKMFilt *f) |
Delete IKM quantizer. | |
Process data | |
VL_EXPORT void | vl_ikm_init (VlIKMFilt *f, vl_ikm_acc const *centers, int M, int K) |
Initialize quantizer with centers. | |
VL_EXPORT void | vl_ikm_init_rand (VlIKMFilt *f, int M, int K) |
Initialize quantizer with random centers. | |
VL_EXPORT void | vl_ikm_init_rand_data (VlIKMFilt *f, vl_uint8 const *data, int M, int N, int K) |
Initialize with centers from random data. | |
VL_EXPORT int | vl_ikm_train (VlIKMFilt *f, vl_uint8 const *data, int N) |
Train clusters. | |
VL_EXPORT void | vl_ikm_push (VlIKMFilt *f, vl_uint *asgn, vl_uint8 const *data, int N) |
Project data to clusters. | |
VL_EXPORT vl_uint | vl_ikm_push_one (vl_ikm_acc const *centers, vl_uint8 const *data, int M, int K) |
Project one datum to clusters. | |
Retrieve data and parameters | |
VL_INLINE int | vl_ikm_get_ndims (VlIKMFilt const *f) |
Get data dimensionality. | |
VL_INLINE int | vl_ikm_get_K (VlIKMFilt const *f) |
Get the number of centers K. | |
VL_INLINE int | vl_ikm_get_verbosity (VlIKMFilt const *f) |
Get verbosity level. | |
VL_INLINE int | vl_ikm_get_max_niters (VlIKMFilt const *f) |
Get maximum number of iterations. | |
VL_INLINE vl_ikm_acc const * | vl_ikm_get_centers (VlIKMFilt const *f) |
Get maximum number of iterations. | |
Set parameters | |
VL_INLINE void | vl_ikm_set_verbosity (VlIKMFilt *f, int verb) |
Set verbosity level. | |
VL_INLINE void | vl_ikm_set_max_niters (VlIKMFilt *f, int max_niters) |
Set maximum number of iterations. |
Typedef Documentation
typedef vl_int32 vl_ikm_acc |
Enumeration Type Documentation
enum VlIKMAlgorithms |
Function Documentation
VL_EXPORT void vl_ikm_delete | ( | VlIKMFilt * | f | ) |
VL_INLINE vl_ikm_acc const * vl_ikm_get_centers | ( | VlIKMFilt const * | f | ) |
VL_INLINE int vl_ikm_get_K | ( | VlIKMFilt const * | f | ) |
VL_INLINE int vl_ikm_get_max_niters | ( | VlIKMFilt const * | f | ) |
VL_INLINE int vl_ikm_get_ndims | ( | VlIKMFilt const * | f | ) |
VL_INLINE int vl_ikm_get_verbosity | ( | VlIKMFilt const * | f | ) |
VL_EXPORT void vl_ikm_init | ( | VlIKMFilt * | f, | |
vl_ikm_acc const * | centers, | |||
int | M, | |||
int | K | |||
) |
- Parameters:
-
f IKM quantizer. centers centers. M data dimensionality. K number of clusters.
Definition at line 71 of file ikmeans_init.tc.
References alloc(), and vl_ikm_init_helper().
VL_EXPORT void vl_ikm_init_rand | ( | VlIKMFilt * | f, | |
int | M, | |||
int | K | |||
) |
- Parameters:
-
f IKM quantizer. M data dimensionality. K number of clusters.
Definition at line 89 of file ikmeans_init.tc.
References alloc(), vl_ikm_init_helper(), and vl_rand_uint32().
- Parameters:
-
f IKM quantizer. data data. M data dimensionality. N number of data. K number of clusters.
Definition at line 115 of file ikmeans_init.tc.
References alloc(), vl_free(), vl_ikm_init_helper(), vl_malloc(), and vl_rand_uint32().
Referenced by xmeans().
VL_EXPORT VlIKMFilt* vl_ikm_new | ( | int | method | ) |
- Parameters:
-
method Clustering algorithm.
method has values in the enumerations VlIKMAlgorithms.
- Returns:
- new IKM quantizer.
Definition at line 105 of file ikmeans.c.
References vl_malloc().
Referenced by xmeans().
- Parameters:
-
f IKM quantizer. asgn Assignments (out). data data. N number of data (N >=
1).
Definition at line 176 of file ikmeans.c.
References VL_IKM_ELKAN, and VL_IKM_LLOYD.
Referenced by vl_hikm_push(), and xmeans().
VL_EXPORT vl_uint vl_ikm_push_one | ( | vl_ikm_acc const * | centers, | |
vl_uint8 const * | data, | |||
int | M, | |||
int | K | |||
) |
- Parameters:
-
centers centers. data datum to project. K number of centers. M dimensionality of the datum.
- Returns:
- the cluster index.
Definition at line 200 of file ikmeans.c.
Referenced by vl_ikm_push_lloyd().
VL_INLINE void vl_ikm_set_max_niters | ( | VlIKMFilt * | f, | |
int | max_niters | |||
) |
VL_INLINE void vl_ikm_set_verbosity | ( | VlIKMFilt * | f, | |
int | verb | |||
) |
- Parameters:
-
f IKM quantizer. data data. N number of data (N >=
1).
- Returns:
- -1 if an overflow may have occurred.
Definition at line 144 of file ikmeans.c.
References VL_IKM_ELKAN, VL_IKM_LLOYD, and VL_PRINTF.
Referenced by xmeans().