aib.h File Reference


Detailed Description

This provides an implementation of Agglomerative Information Bottleneck (AIB) as first described in:

[Slonim] N. Slonim and N. Tishby. Agglomerative information bottleneck. In Proc. NIPS, 1999

AIB takes a discrete valued feature $x$ and a label $c$ and gradually compresses $x$ by iteratively merging values which minimize the loss in mutual information $I(x,c)$.

While the algorithm is equivalent to the one described in [Slonim], it has some speedups that enable handling much larger datasets. Let N be the number of feature values and C the number of labels. The algorithm of [Slonim] is $O(N^2)$ in space and $O(C N^3)$ in time. This algorithm is $O(N)$ space and $O(C N^2)$ time in common cases ($O(C N^3)$ in the worst case).

Overview

Given a discrete feature $x \in \mathcal{X} = \{x_1,\dots,x_N\}$ and a category label $c = 1,\dots,C$ with joint probability $p(x,c)$, AIB computes a compressed feature $[x]_{ij}$ by merging two values $x_i$ and $x_j$. Among all the pairs $ij$, AIB chooses the one that yields the smallest loss in the mutual information

\[ D_{ij} = I(x,c) - I([x]_{ij},c) = \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} + \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} - \sum_c (p(x_i)+p(x_j)) \log \frac {p(x_i,c)+p(x_i,c)}{(p(x_i)+p(x_j))p(c)} \]

AIB iterates this procedure until the desired level of compression is achieved.

Algorithm details

Computing $D_{ij}$ requires $O(C)$ operations. For example, in standard AIB we need to calculate

\[ D_{ij} = I(x,c) - I([x]_{ij},c) = \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} + \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} - \sum_c (p(x_i)+p(x_j)) \log \frac {p(x_i,c)+p(x_i,c)}{(p(x_i)+p(x_j))p(c)} \]

Thus in a basic implementation of AIB, finding the optimal pair $ij$ of feature values requires $O(CN^2)$ operations in total. In order to join all the $N$ values, we repeat this procedure $O(N)$ times, yielding $O(N^3 C)$ time and $O(1)$ space complexity (this does not account for the space need to store the input).

The complexity can be improved by reusing computations. For instance, we can store the matrix $D = [ D_{ij} ]$ (which requires $O(N^2)$ space). Then, after joining $ij$, all of the matrix D except the rows and columns (the matrix is symmetric) of indexes i and j is unchanged. These two rows and columns are deleted and a new row and column, whose computation requires $O(NC)$ operations, are added for the merged value $x_{ij}$. Finding the minimal element of the matrix still requires $O(N^2)$ operations, so the complexity of this algorithm is $O(N^2C + N^3)$ time and $O(N^2)$ space.

We can obtain a much better expected complexity as follows. First, instead of storing the whole matrix D, we store the smallest element (index and value) of each row as $(q_i, D_i)$ (notice that this is also the best element of each column since D is symmetric). This requires $O(N)$ space and finding the minimal element of the matrix requires $O(N)$ operations. After joining $ij$, we have to efficiently update this representation. This is done as follows:

  • The entries $(q_i,D_i)$ and $(q_j,D_j)$ are deleted.
  • A new entry $(q_{ij},D_{ij})$ for the joint value $x_{ij}$ is added. This requires $O(CN)$ operations.
  • We test which other entries $(q_{k},D_{k})$ need to be updated. Recall that $(q_{k},D_{k})$ means that, before the merge, the value closest to $x_k$ was $x_{q_k}$ at a distance $D_k$. Then
    • If $q_k \not = i$, $q_k \not = j$ and $D_{k,ij} \geq D_k$, then $q_k$ is still the closest element and we do not do anything.
    • If $q_k \not = i$, $q_k \not = j$ and $D_{k,ij} < D_k$, then the closest element is $ij$ and we update the entry in constant time.
    • If $q_k = i$ or $q_k = j$, then we need to re-compute the closest element in $O(CN)$ operations.

This algorithm requires only $O(N)$ space and $O(\gamma(N) C N^2)$ time, where $\gamma(N)$ is the expected number of times we fall in the last case. In common cases one has $\gamma(N) \approx \mathrm{const.}$, so the time saving is significant.

Author:
Brian Fulkerson

Andrea Vedaldi

Definition in file aib.h.

#include "generic.h"

Go to the source code of this file.


Data Structures

struct  _VlAIB
 AIB algorithm data. More...

Functions

Create and destroy
VL_EXPORT VlAIBvl_aib_new (double *Pcx, vl_uint nvalues, vl_uint nlabels)
 Allocates and initializes the internal data structure.
VL_EXPORT void vl_aib_delete (VlAIB *aib)
 Deletes AIB data structure.
Process data
VL_EXPORT void vl_aib_process (VlAIB *aib)
 Runs AIB on Pcx.
Retrieve results
VL_INLINE vl_uintvl_aib_get_parents (VlAIB const *aib)
 Get resulting list of parents.
VL_INLINE double * vl_aib_get_costs (VlAIB const *aib)
 Get a list of merge costs.

Function Documentation

VL_EXPORT void vl_aib_delete ( VlAIB aib  ) 

Parameters:
aib data structure to delete.

Definition at line 528 of file aib.c.

References vl_free().

VL_INLINE double * vl_aib_get_costs ( VlAIB const *  aib  ) 

Parameters:
aib AIB filter.
Returns:
An array of costs

Definition at line 108 of file aib.h.

References _VlAIB::costs.

VL_INLINE vl_uint * vl_aib_get_parents ( VlAIB const *  aib  ) 

Parameters:
aib AIB filter.
Returns:
An array of parents

Definition at line 96 of file aib.h.

References _VlAIB::parents.

VL_EXPORT VlAIB* vl_aib_new ( double *  Pcx,
vl_uint  nvalues,
vl_uint  nlabels 
)

Parameters:
Pcx A pointer to a 2D array of probabilities
nvalues The number of rows in the array
nlabels The number of columns in the array
Creates a new VlAIB struct containing pointers to all the data that will be used during the AIB process.

Allocates memory for the following:

  • Px (nvalues*sizeof(double))
  • Pc (nlabels*sizeof(double))
  • nodelist (nvalues*sizeof(vl_uint))
  • which (nvalues*sizeof(vl_uint))
  • beta (nvalues*sizeof(double))
  • bidx (nvalues*sizeof(vl_uint))
  • parents ((2*nvalues-1)*sizeof(vl_uint))
  • costs (nvalues*sizeof(double))

Since it simply copies to pointer to Pcx, the total additional memory requirement is:

(3*nvalues+nlabels)*sizeof(double) + 4*nvalues*sizeof(vl_uint)

Returns:
An allocated and initialized VlAIB pointer

Definition at line 484 of file aib.c.

References _VlAIB::beta, BETA_MAX, _VlAIB::bidx, _VlAIB::costs, _VlAIB::nentries, _VlAIB::nlabels, _VlAIB::nodes, _VlAIB::nvalues, _VlAIB::nwhich, _VlAIB::parents, _VlAIB::Pc, _VlAIB::Pcx, _VlAIB::Px, vl_aib_new_nodelist(), vl_aib_new_Pc(), vl_aib_new_Px(), vl_aib_normalize_P(), vl_malloc(), and _VlAIB::which.

VL_EXPORT void vl_aib_process ( VlAIB aib  ) 

Parameters:
aib AIB object to process
The function runs Agglomerative Information Bottleneck (AIB) on the joint probability table aib->Pcx which has labels along the columns and feature values along the rows. AIB iteratively merges the two values of the feature x that causes the smallest decrease in mutual information between the random variables x and c.

Merge operations are arranged in a binary tree. The nodes of the tree correspond to the original feature values and any other value obtained as a result of a merge operation. The nodes are indexed in breadth-first order, starting from the leaves. The first index is zero. In this way, the leaves correspond directly to the original feature values. In total there are 2*nvalues-1 nodes.

The results may be accessed through vl_aib_get_parents which returns an array with one element per tree node. Each element is the index the parent node. The root parent is equal to zero. The array has 2*nvalues-1 elements.

Feature values with null probability are ignored by the algorithm and their nodes have parents indexing a non-existent tree node (a value bigger than 2*nvalues-1).

Then the function will also compute the information level after each merge. vl_get_costs will return a vector with the information level after each merge. cost has nvalues entries: The first is the value of the cost functional before any merge, and the others are the cost after the nvalues-1 merges.

Definition at line 581 of file aib.c.

References BETA_MAX, _VlAIB::costs, _VlAIB::nodes, _VlAIB::nvalues, _VlAIB::nwhich, _VlAIB::parents, vl_aib_calculate_information(), vl_aib_merge_nodes(), vl_aib_min_beta(), vl_aib_update_beta(), VL_NAN_D, and VL_PRINTF.