Documentation - C API

pegasos.h File Reference

PEGASOS SVM. More...

#include "generic.h"

Functions

void vl_pegasos_train_binary_svm_d (double *model, double const *data, vl_size dimension, vl_size numSamples, vl_int8 const *labels, double regularizer, double biasMultiplier, vl_uindex startingIteration, vl_size numIterations, VlRand *randomGenerator, vl_uint32 const *permutation, vl_size permutationSize, double const *preconditioner)
void vl_pegasos_train_binary_svm_f (float *model, float const *data, vl_size dimension, vl_size numSamples, vl_int8 const *labels, double regularizer, double biasMultiplier, vl_uindex startingIteration, vl_size numIterations, VlRand *randomGenerator, vl_uint32 const *permutation, vl_size permutationSize, float const *preconditioner)

Detailed Description

pegasos.h provides a basic implementation of the PEGASOS [1] linear SVM solver.

Overview

PEGASOS solves the linear SVM learning problem

\[ \min_{w} \frac{\lambda}{2} \|w\|^2 + \frac{1}{m} \sum_{i=1}^n \ell(w; (x_i,y_i)) \]

where $ x_i $ are data vectors in $ \mathbb{R}^d $, $ y_i \in \{-1,1\} $ are binary labels, $ \lambda > 0 $ is the regularization parameter, and

\[ \ell(w;(x_i,y_i)) = \max\{0, 1 - y_i \langle w,x_i\rangle\}. \]

is the hinge loss. The result of the optimization is a model $ w \in \mathbb{R}^d $ that yields the decision function

\[ F(x) = \operatorname{sign} \langle w, x\rangle. \]

It is well known that the hinge loss is a convex upper bound of the i01-loss of the decision function:

\[ \ell(w;(x,y)) \geq \frac{1}{2}(1 - y F(x)). \]

PEGASOS is accessed by calling vl_pegasos_train_binary_svm_d or vl_pegasos_train_binary_svm_f, operating respectively on double or float data.

Algorithm

PEGASOS is a stochastic subgradient optimizer. At the t-th iteration the algorithm:

  • Samples uniformly at random as subset $ A_t$ of k of training pairs $(x,y)$ from the m pairs provided for training (this subset is called mini batch).
  • Computes a subgradient $ \nabla_t $ of the function $ E_t(w) = \frac{1}{2}\|w\|^2 + \frac{1}{k} \sum_{(x,y) \in A_t} \ell(w;(x,y)) $ (this is the SVM objective function restricted to the minibatch).
  • Compute an intermediate weight vector $ w_{t+1/2} $ by doing a step $ w_{t+1/2} = w_t - \alpha_t \nalba_t $ with learning rate $ \alpha_t = 1/(\eta t) $ along the subgradient. Note that the learning rate is inversely proportional to the iteration numeber.
  • Back projects the weight vector $ w_{t+1/2} $ on the hypersphere of radius $ \sqrt{\lambda} $ to obtain the next model estimate $ w_{t+1} $:

    \[ w_t = \min\{1, \sqrt{\lambda}/\|w\|\} w_{t+1/2}. \]

    The hypersfere is guaranteed to contain the optimal weight vector $ w^* $ [1].

VLFeat implementation fixes to one the size of the mini batches $ k $.

Bias

PEGASOS SVM formulation does not incorporate a bias. To learn an SVM with bias, the each data vector $ x $ can be extended by a constant component $ B $ (called biasMultiplier in the code). In this case, the model $ w $ has dimension $ D + 1 $ and the SVM discriminat function is given by $ F(x) = \operatorname{sign} (\langle w_{1:d}, x\rangle+ w_{d+1} B) $. If the bias multiplier B is large enough, the weight $ w_{d+1} $ remains small and it has small contribution in the SVM regularization term $ \| w \|^2 $, better approximating the case of an SVM with bias. Unfortunately, setting the bias multiplier $ B $ to a large value makes the optimization harder.

Restarting

VLFeat PEGASOS implementation can be restatred after any given number of iterations. This is useful to compute intermediate statistics or to load new data from disk for large datasets. The state of the algorithm, which is required for restarting, is limited to the current estimate $ w_t $ of the SVM weight vector and the iteration number $ t $.

Permutation

VLFeat PEGASOS can use a user-defined permutation to decide the order in which data points are visited (instead of using random sampling). By specifying a permutation the algorithm is guaranteed to visit each data point exactly once in each loop. The permutation needs not to be bijective. This can be used to visit certain data samples more or less often than others, implicitly reweighting their relative importance in the SVM objective function. This can be used to blanace the data.

Non-linear kernels

PEGASOS can be extended to non-linear kernels, but the algorithm is not particularly efficient in this setting [1]. When possible, it may be preferable to work with explicit feature maps.

Let $ k(x,y) $ be a positive definite kernel. A feature map is a function $ \Psi(x) $ such that $ k(x,y) = \langle \Psi(x), \Psi(y) \rangle $. Using this representation the non-linear SVM learning objective function writes:

\[ \min_{w} \frac{\lambda}{2} \|w\|^2 + \frac{1}{m} \sum_{i=1}^n \ell(w; (\Psi(x)_i,y_i)). \]

Thus the only difference with the linear case is that the feature $ \Psi(x) $ is used in place of the data $ x $.

$ \Psi(x) $ can be learned off-line, for instance by using the incomplete Cholesky decomposition $ V^\top V $ of the Gram matrix $ K = [k(x_i,x_j)] $ (in this case $ \Psi(x_i) $ is the i-th columns of V). Alternatively, for additive kernels (e.g. intersection, Chi2) the explicit feature map computed by homkermap.h can be used.

References

[1] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-GrAdient SOlver for SVM. In Proc. ICML, 2007.

Author:
Andrea Vedaldi

Function Documentation

vl_pegasos_train_binary_svm_d ( double *  model,
double const *  data,
vl_size  dimension,
vl_size  numSamples,
vl_int8 const *  labels,
double  regularizer,
double  biasMultiplier,
vl_uindex  startingIteration,
vl_size  numIterations,
VlRand randomGenerator,
vl_uint32 const *  permutation,
vl_size  permutationSize,
double const *  preconditioner 
)
Parameters:
model(out) the learned model.
datatraining vectors.
dimensiondata dimension.
numSamplesnumber of training data vectors.
labelslabels of the training vectors.
regularizervalue of the regularizer coefficient $ \lambda $.
biasMultipliervalue of the bias multiplier $ B $.
startingIterationnumber of the first iteration.
numIterationsnumber of iterations to perform.
randomGeneratorrandom number generator.
permutationorder in which the data is accessed.
permutationSizelength of permutation.
preconditionerdiagonal precoditioner.

The function runs PEGASOS on the specified data. The vector model must have either dimension equal to dimension if biasMultiplier is zero, or dimension + 1 if biasMultiplier is larger than zero.

The function runs PEGASOS for iterations t in the interval [fistIteration, lastIteration]. Together with the fact that the initial model can be set arbitrarily, this enable restarting PEGASOS from any point.

PEGASOS select the next point for computing the gradient at random. If randomGenerator is NULL, the default random generator (as returned by vl_get_rand()) is used.

Alternatively, if permutation is not NULL, then points are sampled in the order specified by this vector of indexes (this is cycled through). In this way It is an error to set both randomGenerator and permutation to non-null values.

preconditioner specifies a diagonal preconditioner for the minimization problem (it is often useful to slow down the steps for the bias term, if the latter is used). Set preconditioner to NULL to avoid using a preconditioner. The precodnitioner should have the same dimension of the model, plus one if an SVM with bias is learned.

See the Overview overview for details.

vl_pegasos_train_binary_svm_f ( float *  model,
float const *  data,
vl_size  dimension,
vl_size  numSamples,
vl_int8 const *  labels,
double  regularizer,
double  biasMultiplier,
vl_uindex  startingIteration,
vl_size  numIterations,
VlRand randomGenerator,
vl_uint32 const *  permutation,
vl_size  permutationSize,
float const *  preconditioner 
)