Code - SVM^struct MATLAB

svm-struct-matlab is a MATLAB wrapper of T. Joachims' SVMstruct.It simplifies coding your own structural SVM instances by means of simple MATLAB function callbacks. If you use this software in research, please cite it according to T. Joachims' guidelines. Please consider citing also:

A. Vedaldi, A MATLAB wrapper of SVMstruct,
http://www.vlfeat.org/~vedaldi/code/ svm-struct-matlab.html, 2008. BibTeX
@misc{vedaldi11svm-struct-matlab,
  Author = {A. Vedaldi},
  Title = {A {MATLAB} wrapper of $\mathrm{SVM}^\mathrm{struct}$},
  Year  = {2011},
  Howpublished = {\url{http://www.vlfeat.org/~vedaldi/code/svm-struct-matlab.html}}
}

This MATLAB wrapper is distributed under the permissive MIT license. However, this license does not cover SVMstruct itself, which is instead bundled under its original license.

Changes

1.3
(08/31/2012) Adds support for the endIterationFn callback.
1.2
(08/31/2012) Adds support for Xcode 4.0 and Mac OS X 10.7 and greater.
1.1
(11/12/2011) Adds support for Windows (thansk to Iasonas Kokkinos!).
1.0
(10/5/2011) Initial release.

Download and install

The archive contains the SVMstruct wrapper along with the original SVMstruct code of T. Joachims. To compile the code, you will need MATLAB and a C compiler (typical Xcode under Mac, and GCC under Linux, and Visual C under Windows). On Mac and Linux, assuming that MATLAB mex command is available on the command line search path, the following command should compile the MEX file implementing the wrapper:

$ make

On Windows, start MATLAB instead and run the script:

>> makefile_windows

If all goes well, a file svm_struct_learn.mex* should be produced. To use the wrapper, you will need only this file and svm_struct_learn.m for the on-line help. Put it in a location included in MATLAB search path. To test the installation, open MATLAB and run the bundled demo program test_svm_struct_matlab.m:

> test_svm_struct_matlab

This command should learn a linear SVM separating two clouds of 2D dots. To use the package you will need a basic understanding of structural SVMs. To get started see the example and the references.

Structured SVM tutorial

A tutorial introduction to structured SVM can be found here. It contains additional example on using SVMstruct MATLAB beyond the elementary examples below.

Example: learning a linear SVM

The complete source code of this example is the file test_svm_struct_learn.m.

The function svm_struct_learn is called as follows:

parm.patterns = patterns ;
parm.labels = labels ;
parm.lossFn = @lossCB ;
parm.constraintFn  = @constraintCB ;
parm.featureFn = @featureCB ;
parm.dimension = 2 ;
args = ' -c 1.0 -o 1 -v 1 ' ;
model = svm_struct_learn(args, param) ;

where

We now show how to use svm_struct_learn to learn a linear SVM \(\mathbf{w} \in \mathbb{R}^2\) separating a set of 2D points with binary labels \((\mathbf{x}_1,y_n),\dots,(\mathbf{x}_n,y_n) \in \mathbb{R}^2 \times \{-1,+1\}\). The goal is to learn a predictor (input-output map) \(\mathbf{x}\mapsto y\) that fits the training data (while being sufficiently smooth). The structural SVM formulation of the predictor is $$ \hat y(\mathbf{x}) = \operatorname{argmax}_{y\in\{-1,+1\}} F(\mathbf{x},y;\mathbf{w}), \quad F(\mathbf{x},y;\mathbf{w}) =\langle \mathbf{w}, \Psi(\mathbf{x},y)\rangle $$ where \(\mathbf{w}\) is the parameter vector to be learned, \(\Psi(\mathbf{x},y)\in\mathbf{R}^2\) is the joint feature map, and \(F(\mathbf{x},y;\mathbf{w})\) is an auxiliary function usually interpreted as a compatibility score between the input \(\mathbf{x}\) and the output \(y\).

In this example we define simply \(\Psi(\mathbf{x},y)=\mathbf{x}y/2\), which ultimately will make the structures SVM formulation equivalent to a standard binary SVM. The feature map is implemented by the code

function w = featureCB(param, x, y)
  w = sparse(y*x/2) ;
end

Then we define a loss function \(\Delta(y,\hat y)\) measuring how well the prediction \(\hat y\) matches the ground truth \(y\). We use the 01-loss here, which, for our binary labels, writes \(\Delta(y,\hat y) = (1 - y\hat y) / 2\). The loss is implemented by

function delta = lossCB(param, y, ybar)
  delta = double(y ~= ybar) ;
end

Finally, we need to define the constraint generation function, which captures the structure of the problem. Generating a constraint for an input-output pair \((\mathbf{x},y)\) means identifying what is the most incorrect output \(\hat y\) that the current model still deem to be compatible with the input \(\mathbf{x}\). Mathematically, this amounts to solving $$ \max_{\hat y \in\{-1,+1\}} \Delta(y,\hat y) + \langle \Psi(\mathbf{x},\hat y) , \mathbf{w} \rangle - \langle \Psi(\mathbf{x},y) , \mathbf{w} \rangle $$ for the so called margin rescaling formulation and $$ \max_{\hat y \in\{-1,+1\}} \Delta(y,\hat y)[1 + \langle \Psi(\mathbf{x},\hat y) , \mathbf{w} \rangle - \langle \Psi(\mathbf{x},y) , \mathbf{w} \rangle]. $$ for the slack rescaling formulation. In this simple example the two formulations coincide and are implemented by

function yhat = constraintCB(param, model, x, y)
  if dot(y*x, model.w) > 1, yhat = y ; else yhat = - y ; end
end

Note that the function accesses the current model weight vector \(\mathbf{w}\) as model.w.

The SVM-struct parameters -c 1.0 sets the C constant of the SVM to 1.0, -o 1 sets the formulation to slack rescaling, and -v 1 increases the verbosity level.

Example: using a kernel

The complete source code of this example is the file test_svm_struct_learn_ker.m.

The code supports using a kernel rather than an explicit feature map. This is not particularly recommended with structural SVMs due to speed considerations (there are usually too many support vectors to make this efficient), and is particularly slow with this MATLAB wrapper due to the necessity of calling a MATLAB function for each kernel evaluation. The support is included for completeness.

The svm_struct_learn function is called as

parm.patterns = patterns ;
parm.labels = labels ;
parm.lossFn = @lossCB ;
parm.constraintFn  = @constraintCB ;
parm.kernelFn = @kernelCB ;
args = ' -c 1.0 -o 1 -v 1 ' ;
model = svm_struct_learn(args, param) ;

Instead of the feature callback, there is now a kernel callback which computes the joint kernel \(K(\mathbf{x},y,\mathbf{x}',y')\). For the binary linear SVM equivalent of the previous example \(K(\mathbf{x},y,\mathbf{x}',y')=yy'\langle\mathbf{x},\mathbf{x'}\rangle/4\) this is defined as

function k = kernelCB(param, x,y, xp, yp)
  k = x' * xp * y * yp / 4 ;
end

The loss function is the same as before, whereas the constraint generation function needs to be modified to use the kernel as

function ybar = constraintCB(param, model, x, y)
  if size(model.svPatterns, 2) == 0
    w = zeros(size(x)) ;
  else
    w = cat(2, model.svPatterns{:}) * ...
      (model.alpha .* cat(1, model.svLabels{:})) / 2 ;
  end
  if dot(y*x, w) > 1, ybar = y ; else ybar = -y ; end
end

Note that with a kernel the model weight vector \(\mathbf{w}\) is defined only implicitly. In particular, the input model contains the dual variables model.alpha and the support vectors in the form of a cell array of patterns model.svPatterns and corresponding labels model.svLabels. In this simple example the feature map is trivial and the information in model is used to reconstruct the weight vector \(\mathbf{w}\) .

References

  1. T. Joachims, Learning to Align Sequences: A Maximum Margin Approach. Technical Report, September, 2003.
  2. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), Vol. 6(Sep):1453-1484, 2005.
  3. T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.
  4. T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Dissertation, Kluwer, 2002.
  5. T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning Journal, to appear.