Maximally Stable Extremal Regions (MSER) is a feature detector; Analogously to the SIFT detector, the MSER algorithm extracts from an image I a number of co-variant regions, called MSERs. An MSER is a stable connected component of some level sets of the image I. Optionally, elliptical frames are attached to the MSERs by fitting ellipses.

Extracting MSERs

A test image.
Extracted MSERs (left) and fitted ellipses (right).

Each MSERs can be identified uniquely by (at least) one of its pixels x, as the connected component of the level set at level I(x) which contains x. Such a pixel is called seed of the region.

To demonstrate the usage of the MATLAB command mser (there is a similarly named command line utility as well), we open MATLAB and we load a test image

pfx = fullfile(vl_root,'data','spots.jpg') ;
I = imread(pfx) ;
image(I) ; 

We then convert the image to a format that is suitable for the mser command.

I = uint8(rgb2gray(I)) ;

We compute the region seeds and the elliptical frames by

[r,f] = vl_mser(I,'MinDiversity',0.7,...
                'MaxVariation',0.2,...
                'Delta',10) ;

We plot the region frames by

clf ; imagesc(I) ; 
hold on ;
f = vl_ertr(f) ;
vl_plotframe(f) ;

The vl_ertr transposes the elliptical frame and is needed because the mser code assumes that the row index is the first index, but the normal image convention assumes that this is the x (column) index.

Plotting the MSERs themselves is a bit more involved as they have arbitrary shape. To this end, we exploit two functions: vl_erfill, which, given an image and a region seed, returns a list of the pixels belonging to that region, and MATLAB built-in contourn, which draws the contour lines of a function. We start by

M = zeros(size(I)) ;
for x=r'
 s = vl_erfill(I,x) ;
 M(s) = M(s) + 1;
end

which computes a matrix M whose value are equal to the number of overlapping extremal regions. Next, we use M and contourn to display the region boundaries:

clf ; imagesc(I) ;
hold on ;
[c,h]=contourn(M,(0:max(M(:)))+.5) ;

MSER parameters

In the original formulation, MSERs are controlled by a single parameter Δ. Our implementation uses a few more parameters to refine even more the selection of the useful extremal regions.

Understanding the parameters requires to know how ``stability'' of an extremal region is defined. The stability of an extremal region R is the inverse of the relative area variation of the region R when the intensity level is increased by Δ. Formally, the area variation is defined as |R(+Δ) - R(-Δ)|/|R|, where $|R|$ denotes the area of the extremal region R, R(+Δ) is the extremal region levels up which contains R and R(+Δ) - R is the set difference of the two regions. If the are variation is small, then the region is deemed stable.

Based on the stability score, the algorithm keeps the extremal regions which are maximally stable, meaning that they have minimum variation compared to the extremal region one intensity level below and one intensity level up\footnote{Due to the discrete nature of the image, the region below/up may be coincident with the actual region, in which case the region is still deemed maximal.}. Even if an extremal region is maximally stable, it might be rejected if

The interpretation of the parameters MaxArea and MinArea. The parameter MaxVariation removes regions that are too unstable (even if they might be maximally stable in the local). The interaction of the various parameters are illustrated next.

Meaning of parameter Δ. The intensity profile. The bumps have heights equal to 32, 64, 96, 128 and 160 intensity levels. The variation score of a bump is either 0 if Δ is smaller than the bump height or very large otherwise (as the next extremal region is as big as the whole image). MaxVariation is set to 0.25.

While the concept of local stability is simple to grasp, there is a complication that we did not address yet, i.e. how to define ``locality'' in the comparison of extremal regions. For instance, one could define neighbors of an extremal region $R$ the parent extremal region R(+1) at level e would compare the region R with the region R_{+1}.

Conventions

As mentioned in the introduction, the mser uses the matrix indices as image coordinates. Compared to the usual MATLAB convention for images, this means that the x and y axis are swapped (this has been done to make the convention consistent with images with three or more dimensions). Thus the frames computed by the program may need to be ``transposed'' as in:

[r,f] = vl_mser(I) ;
f = vl_ertr(f) ;

On the other hand, the region seeds r are already in row major format, which is how MATLAB standard format for pixel indices.

Instead of transposing the frames, one can start by transposing the image. In this case, the frames f have the standard image convention, but the region seeds are in column-major format and may need to be ``transposed'' as in:

[r,f] = vl_mser(I') ;
[i,j] = sub2ind(size(I'),r) ;
r  = ind2sub(size(I),j,i) ; 

The command line utility mser uses the normal image convention (because images are rasterized in column-major order). Therefore the image frames are in the standard format, and the region seeds are in column major format.

In order to convert from the command line utility convention and the MATLAB convention one needs also to recall that MATLAB coordinates starts from (1,1), but the command line utility uses the more common convention (0,0). For instance, let the files image.frame and image.seed contain the feature frames and seeds in ASCII format as generated by the command line utility. Then

r_ = load('image.seed')' + 1 ;
f_ = load('image.frame')' ; 
f_(1:2,:) = f_(1:2,:) + 1 ;
[r,f] = vl_mser(I') ; % notice the transpose

produces identical (up to numerical noise) region seeds r and r_ and frames f and f_.