Quick shift image segmentation

quickshift.h implements an image segmentation algorithm based on the quick shift clustering algorithm [29] .

# Overview

Quick shift [29] is a fast mode seeking algorithm, similar to mean shift. The algorithm segments an RGB image (or any image with more than one channel) by identifying clusters of pixels in the joint spatial and color dimensions. Segments are local (superpixels) and can be used as a basis for further processing.

Given an image, the algorithm calculates a forest of pixels whose branches are labeled with a distance value (vl_quickshift_get_parents, vl_quickshift_get_dists). This specifies a hierarchical segmentation of the image, with segments corresponding to subtrees. Useful superpixels can be identified by cutting the branches whose distance label is above a given threshold (the threshold can be either fixed by hand, or determined by cross validation).

Parameter influencing the algorithm are:

• Kernel size. The pixel density and its modes are estimated by using a Parzen window estimator with a Gaussian kernel of the specified size (vl_quickshift_set_kernel_size). The larger the size, the larger the neighborhoods of pixels considered.
• Maximum distance. This (vl_quickshift_set_max_dist) is the maximum distance between two pixels that the algorithm considers when building the forest. In principle, it can be infinity (so that a tree is returned), but in practice it is much faster to consider only relatively small distances (the maximum distance can be set to a small multiple of the kernel size).

# Technical details

For each pixel (x,y), quick shift regards $(x,y,I(x,y))$ as a sample from a d + 2 dimensional vector space. It then calculates the Parzen density estimate (with a Gaussian kernel of standard deviation $\sigma$)

$E(x,y) = P(x,y,I(x,y)) = \sum_{x'y'} \frac{1}{(2\pi\sigma)^{d+2}} \exp \left( -\frac{1}{2\sigma^2} \left[ \begin{array}{c} x - x' \\ y - y' \\ I(x,y) - I(x',y') \\ \end{array} \right] \right)$

Then quick shift construct a tree connecting each image pixel to its nearest neighbor which has greater density value. Formally, write $(x',y') >_P (x,y)$ if, and only if,

$P(x',y',I(x',y')) > P(x,y,I(x,y))}.$

Each pixel (x, y) is connected to the closest higher density pixel parent(x, y) that achieves the minimum distance in

$\mathrm{dist}(x,y) = \mathrm{min}_{(x',y') > P(x,y)} \left( (x - x')^2 + (y - y')^2 + \| I(x,y) - I(x',y') \|_2^2 \right).$