aib.c File Reference

AIB - Definition. More...

#include "aib.h"
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <math.h>

## Functions

void vl_aib_normalize_P (double *P, vl_uint nelem)
Normalizes an array of probabilities to sum to 1. More...

vl_uintvl_aib_new_nodelist (vl_uint nentries)
Allocates and creates a list of nodes. More...

double * vl_aib_new_Px (double *Pcx, vl_uint nvalues, vl_uint nlabels)
Allocates and creates the marginal distribution Px. More...

double * vl_aib_new_Pc (double *Pcx, vl_uint nvalues, vl_uint nlabels)
Allocates and creates the marginal distribution Pc. More...

void vl_aib_min_beta (VlAIB *aib, vl_uint *besti, vl_uint *bestj, double *minbeta)
Find the two nodes which have minimum beta. More...

void vl_aib_merge_nodes (VlAIB *aib, vl_uint i, vl_uint j, vl_uint new)
Merges two nodes i,j in the internal datastructure. More...

void vl_aib_update_beta (VlAIB *aib)
Updates aib->beta and aib->bidx according to aib->which. More...

void vl_aib_calculate_information (VlAIB *aib, double *I, double *H)
Calculates the current information and entropy. More...

VlAIBvl_aib_new (double *Pcx, vl_uint nvalues, vl_uint nlabels)
Allocates and initializes the internal data structure. More...

void vl_aib_delete (VlAIB *aib)
Deletes AIB data structure. More...

void vl_aib_process (VlAIB *aib)
Runs AIB on Pcx. More...

## Function Documentation

 void vl_aib_calculate_information ( VlAIB * aib, double * I, double * H )
Parameters
 aib A pointer to the internal data structure I The current mutual information (out). H The current entropy (out).

Calculates the current mutual information and entropy of Pcx and sets I and H to these new values.

 void vl_aib_delete ( VlAIB * aib )
Parameters
 aib data structure to delete.
 void vl_aib_merge_nodes ( VlAIB * aib, vl_uint i, vl_uint j, vl_uint new )
Parameters
 aib A pointer to the internal data structure i The index of one member of the pair to merge j The index of the other member of the pair to merge new The index of the new node which corresponds to the union of (i, j).

Nodes are merged by replacing the entry i with the union of ij, moving the node stored in last position (called lastnode) back to jth position and the entry at the end.

After the nodes have been merged, it updates which nodes should be considered on the next iteration based on which beta values could potentially change. The merged node will always be part of this list.

 void vl_aib_min_beta ( VlAIB * aib, vl_uint * besti, vl_uint * bestj, double * minbeta )
Parameters
 aib A pointer to the internal data structure besti The index of one member of the pair which has mininum beta bestj The index of the other member of the pair which minimizes beta minbeta The minimum beta value corresponding to (i, j)

Searches aib->beta to find the minimum value and fills minbeta and besti and bestj with this information.

 VlAIB* vl_aib_new ( double * Pcx, vl_uint nvalues, vl_uint nlabels )
Parameters
 Pcx A pointer to a 2D array of probabilities nvalues The number of rows in the array nlabels The number of columns in the array

Creates a new VlAIB struct containing pointers to all the data that will be used during the AIB process.

Allocates memory for the following:

• Px (nvalues*sizeof(double))
• Pc (nlabels*sizeof(double))
• nodelist (nvalues*sizeof(vl_uint))
• which (nvalues*sizeof(vl_uint))
• beta (nvalues*sizeof(double))
• bidx (nvalues*sizeof(vl_uint))
• parents ((2*nvalues-1)*sizeof(vl_uint))
• costs (nvalues*sizeof(double))

Since it simply copies to pointer to Pcx, the total additional memory requirement is:

(3*nvalues+nlabels)*sizeof(double) + 4*nvalues*sizeof(vl_uint)

Returns
An allocated and initialized VlAIB pointer
 vl_uint* vl_aib_new_nodelist ( vl_uint nentries )
Parameters
 nentries The size of the list which will be created
Returns
an array containing elements 0...nentries
 double* vl_aib_new_Pc ( double * Pcx, vl_uint nvalues, vl_uint nlabels )
Parameters
 Pcx A two-dimensional array of probabilities nvalues The number of rows in Pcx nlabels The number of columns in Pcx
Returns
an array of size nlabels which contains the marginal distribution over the columns
 double* vl_aib_new_Px ( double * Pcx, vl_uint nvalues, vl_uint nlabels )
Parameters
 Pcx A two-dimensional array of probabilities nvalues The number of rows in Pcx nlabels The number of columns in Pcx
Returns
an array of size nvalues which contains the marginal distribution over the rows.
 void vl_aib_normalize_P ( double * P, vl_uint nelem )
Parameters
 P The array of probabilities nelem The number of elements in the array
Returns
Modifies P to contain values which sum to 1
 void vl_aib_process ( VlAIB * aib )
Parameters
 aib AIB object to process

The function runs Agglomerative Information Bottleneck (AIB) on the joint probability table aib->Pcx which has labels along the columns and feature values along the rows. AIB iteratively merges the two values of the feature x that causes the smallest decrease in mutual information between the random variables x and c.

Merge operations are arranged in a binary tree. The nodes of the tree correspond to the original feature values and any other value obtained as a result of a merge operation. The nodes are indexed in breadth-first order, starting from the leaves. The first index is zero. In this way, the leaves correspond directly to the original feature values. In total there are 2*nvalues-1 nodes.

The results may be accessed through vl_aib_get_parents which returns an array with one element per tree node. Each element is the index the parent node. The root parent is equal to zero. The array has 2*nvalues-1 elements.

Feature values with null probability are ignored by the algorithm and their nodes have parents indexing a non-existent tree node (a value bigger than 2*nvalues-1).

Then the function will also compute the information level after each merge. vl_get_costs will return a vector with the information level after each merge. cost has nvalues entries: The first is the value of the cost functional before any merge, and the others are the cost after the nvalues-1 merges.

 void vl_aib_update_beta ( VlAIB * aib )
Parameters
 aib AIB data structure.

The function calculates beta[i] and bidx[i] for the nodes i listed in aib->which. beta[i] is the minimal variation of mutual information (or other score) caused by merging entry i with another entry and bidx[i] is the index of this best matching entry.

Notice that for each entry i that we need to update, a full scan of all the other entries must be performed.