Stochastic Dual Coordinate Ascent

This page describes the Stochastic Dual Coordinate Ascent (SDCA) linear SVM solver. Please see Support Vector Machines (SVM) for an overview of VLFeat SVM support.

SDCA maximizes the dual SVM objective (see Dual problem for a derivation of this expression):

$D(\balpha) = - \frac{1}{2\lambda n^2} \balpha^\top X^\top X \balpha + \frac{1}{n} \sum_{i=1}^n - \ell_i^*(-\alpha_i)$

where $X$ is the data matrix. Recall that the primal parameter corresponding to a given setting of the dual variables is:

$\bw(\balpha) = \frac{1}{\lambda n} \sum_{i=1}^n \bx_i \alpha_i = \frac{1}{\lambda n} X\balpha$

In its most basic form, the SDCA algorithm can be summarized as follows:

• Let $\balpha_0 = 0$.
• Until the duality gap $P(\bw(\balpha_t)) - D(\balpha_t) < \epsilon$
• Pick a dual variable $q$ uniformly at random in $1, \dots, n$.
• Maximize the dual with respect to this variable: $\Delta\alpha_q = \max_{\Delta\alpha_q} D(\balpha_t + \Delta\alpha_q \be_q )$
• Update $\balpha_{t+1} = \balpha_{t} + \be_q \Delta\alpha_q$.

In VLFeat, we partially use the nomenclature from  and  .

# Dual coordinate maximization

The updated dual objective can be expanded as:

$D(\balpha_t + \be_q \Delta\alpha_q) = \text{const.} - \frac{1}{2\lambda n^2} \bx_q^\top \bx_q (\Delta\alpha_q)^2 - \frac{1}{n} \bx_q^\top \frac{X\alpha_t}{\lambda n} \Delta\alpha_q - \frac{1}{n} \ell^*_q(- \alpha_q - \Delta\alpha_q)$

This can also be written as

\begin{align*} D(\balpha_t + \be_q \Delta\alpha_q) &\propto - \frac{A}{2} (\Delta\alpha_q)^2 - B \Delta\alpha_q - \ell^*_q(- \alpha_q - \Delta\alpha_q), \\ A &= \frac{1}{\lambda n} \bx_q^\top \bx_q = \frac{1}{\lambda n} \| \bx_q \|^2, \\ B &= \bx_q^\top \frac{X\balpha_t}{\lambda n} = \bx_q^\top \bw_t. \end{align*}

Maximizing this quantity in the scalar variable $\Delta\balpha$ is usually not difficult. It is convenient to store and incrementally update the model $\bw_t$ after the optimal step $\Delta\balpha$ has been determined:

$\bw_t = \frac{X \balpha_t}{\lambda n}, \quad \bw_{t+1} = \bw_t + \frac{1}{\lambda n }\bx_q \be_q \Delta\alpha_q.$

For example, consider the hinge loss as given in Advanced SVM topics :

$\ell_q^*(u) = \begin{cases} y_q u, & -1 \leq y_q u \leq 0, \\ +\infty, & \text{otherwise}. \end{cases}$

The maximizer $\Delta\alpha_q$ of the update objective must be in the range where the conjugate loss is not infinite. Ignoring such bounds, the update can be obtained by setting the derivative of the objective to zero, obtaining

$\tilde {\Delta \alpha_q}= \frac{y_q - B}{A}.$

Note that $B$ is simply current score associated by the SVM to the sample $\bx_q$. Incorporating the constraint $-1 \leq - y_q (\alpha_q + \Delta \alpha_q) \leq 0$, i.e. $0 \leq y_q (\alpha_q + \Delta \alpha_q) \leq 1$, one obtains the update

$\Delta\alpha_q = y_q \max\{0, \min\{1, y_q (\tilde {\Delta\alpha_q } + \alpha_q)\}\} - \alpha_q.$

# Implementation details

Rather than visiting points completely at random, VLFeat SDCA follows the best practice of visiting all the points at every epoch (pass through the data), changing the order of the visit randomly by picking every time a new random permutation.