[W B] = VL_SVMTRAIN(X, Y, LAMBDA) trains a linear Support Vector Machine (SVM) from the data vectors X and the labels Y. X is a D by N matrix, with one column per example and D feature dimensions (SINGLE or DOUBLE). Y is a DOUBLE vector with N elements with a binary (-1 or +1) label for each training point. To a first order approximation, the function computes a weight vector W and offset B such that the score W'*X(:,i)+B has the same sign of LABELS(i) for all i.

VL_SVMTRAIN(DATASET, LABELS, LAMBDA) takes as input a DATASET structure, which allows more sophisticated input formats to be supported (see VL_SVMDATASET()).

[W, B, INFO] = VL_SVMTRAIN(...) additionally returns a structure INFO with the following fields:

- iteration
Number of iterations performed.

- epoch
Number of iterations over number of training data points.

- elapsedTime
Time elapsed since the start of training.

- objective
SVM objective value.

- regularizer
Regularizer value.

- loss
Loss value.

- scoreVariation [SGD only]
Mean square root of the difference between the last two values of the SVM scores for each point.

- dualObjective [SDCA only]
Dual objective value.

- dualLoss [SDCA only]
- Dual loss value

- dualityGap [SDCA only]
Difference between the objective and the dual objective.

[W, B, INFO, SCORES] = VL_SVMTRAIN(X, Y, LABMDA) returns a row vector of the SVM score for each training point. This can be used in combination with the options SOLVER, MODEL, and BIAS to evaluate an existing SVM on new data points. Furthermore INFO will contain the corresponding SVM loss, regularizer, and objective function value. If this information is not of interest, it is possible to pass a null vector Y instead of the actual labels as well as a null regularizer.

VL_SVMTRAIN() accepts the following options:

- Verbose
Specify one or multiple times to increase the verbosity level. Given only once, produces messages at the beginning and end of the learning. Verbosity of at least 2 prints information at every diagnostic step.

- Epsilon 1e-3
Tolerance for the stopping criterion.

- MaxNumIterations 10/LAMBDA
Maximum number of iterations.

- BiasMultiplier 1
Value of the constant B0 used as bias term (see below).

- BiasLearningRate 0.5
Learning rate for the bias (SGD solver only).

- DiagnosticFunction []
Diagnostic function callback. The callback takes the INFO structure as only argument. To trace energies and plot graphs, the callback can update a global variable or, preferably, be defined as a nested function and update a local variable in the parent function.

- DiagnosticFrequency Number of data points
After how many iteration the diagnostic is run. This step check for convergence, and is done rarely, typically after each epoch (pass over the data). It also calls the DiangosticFunction, if any is specified.

- Loss HINGE
Loss function. One of HINGE, HINGE2, L1, L2, LOGISTIC.

- Solver SDCA
One of SGD (stochastic gradient descent [1]), SDCA (stochastic dual coordinate ascent [2,3]), or NONE (no training). The last option can be used in combination with the options MODEL and BIAS to evaluate an existing SVM.

- Model null vector
Specifies the initial value for the weight vector W (SGD only).

- Bias 0
Specifies the initial value of the bias term (SGD only).

- Weights []
Specifies a weight vector to assign a different non-negative weight to each data point. An application is to rebalance unbalanced datasets.

FORMULATION

VL_SVMTRAIN() minimizes the objective function of the form:

LAMBDA/2 |W|^2 + 1/N SUM_i LOSS(W' X(:,i), Y(i))

where LOSS(W' Xi,Yi) is the loss (hinge by default) for i-th data point. The bias is incorporated by extending each data point X with a feature of constant value B0, such that the objective becomes

LAMBDA/2 (|W|^2 + WB^2) 1/N SUM_i LOSS(W' X(:,i) + WB B0, Y(i))

Note that this causes the learned bias B = WB B0 to shrink towards the origin.

- Example
Learn a linear SVM from data X and labels Y using 0.1 as regularization coefficient:

[w, b] = vl_svmtrain(x, y, 0.1) ;

The SVM can be evaluated on new data XTEST with:

scores = w'*xtest + b ;

Alternatively, VL_SVMTRAIN() can be used for evaluation too:

[~,~,~, scores] = vl_svmtrain(xtest, y, 0, 'model', w, 'bias', b, 'solver', 'none') ;

The latter form is particularly useful when X is a DATASET structure.

See also: SVM fundamentals, VL_SVMDATASET(), VL_HELP().