VLAD normalization

This page describes the Vector of Locally Aggregated Descriptors (VLAD) image encoding of [11] . See Vector of Locally Aggregated Descriptors (VLAD) encoding for an overview of the C API.

VLAD is a feature encoding and pooling method, similar to Fisher vectors. VLAD encodes a set of local feature descriptors \(I=(\bx_1,\dots,\bx_n)\) extracted from an image using a dictionary built using a clustering method such as Gaussian Mixture Models (GMM) or K-means clustering. Let \(q_{ik}\) be the strength of the association of data vector \(\bx_i\) to cluster \(\mu_k\), such that \(q_{ik} \geq 0\) and \(\sum_{k=1}^K q_{ik} = 1\). The association may be either soft (e.g. obtained as the posterior probabilities of the GMM clusters) or hard (e.g. obtained by vector quantization with K-means).

\(\mu_k\) are the cluster means, vectors of the same dimension as the data \(\bx_i\). VLAD encodes feature \(\bx\) by considering the residuals

\[ \bv_k = \sum_{i=1}^{N} q_{ik} (\bx_{i} - \mu_k). \]

The residulas are stacked together to obtain the vector

\[ \hat\Phi(I) = \begin{bmatrix} \vdots \\ \bv_k \\ \vdots \end{bmatrix} \]

Before the VLAD encoding is used it is usually normalized, as explained VLAD normalization next.

VLAD normalization

VLFeat VLAD implementation supports a number of different normalization strategies. These are optionally applied in this order:

Component-wise mass normalization. Each vector \(\bv_k\) is divided by the total mass of features associated to it \(\sum_{i=1}^N q_{ik}\).
Square-rooting. The function \(\sign(z)\sqrt{|z|}\) is applied to all scalar components of the VLAD descriptor.
**Component-wise \(l^2\) normalization.** The vectors \(\bv_k\) are divided by their norm \(\|\bv_k\|_2\).
**Global \(l^2\) normalization.** The VLAD descriptor \(\hat\Phi(I)\) is divided by its norm \(\|\hat\Phi(I)\|_2\).

VLFeat.org

Table of Contents

VLAD normalization