CNN wrappers

At its core, MatConvNet consists of a number of MATLAB functions implementing CNN building blocks. These are usually combined into complete CNNs by using one of the two CNN wrappers. The first wrapper is SimpleNN, most of which is implemented by the MATLAB function vl_simplenn. SimpleNN is suitable for networks that have a linear topology, i.e. a chain of computational blocks. The second wrapper is DagNN, which is implemented as the MATLAB class dagnn.DagNN.

SimpleNN wrapper

The SimpleNN wrapper is implemented by the function vl_simplenn and a few others. This is a lightweight wrapper, suitable for CNN consistting of a simple chain of blocks.

To start with SimpleNN, create a net structure, populating the cellarray net.layers with a list of layers. For example:

net.layers{1} = struct(...
    'name', 'conv1', ...
    'type', 'conv', ...
    'weights', {{randn(10,10,3,2,'single'), randn(2,1,'single')}}, ...
    'pad', 0, ...
    'stride', 1) ;
net.layers{2} = struct(...
    'name', 'relu1', ...
    'type', 'relu') ;

Now the convolutional and ReLU layers will be executed in sequence. The convolution has a bank of two 10x10x3 filters. Evaluation can be obtained as follows:

data = randn(300, 500, 3, 5, 'single') ;
res = vl_simplenn(net, data) ;

The structure res contains the result of the computation, with one entry for each variable in the architecture:

>> res
res =
1x3 struct array with fields:

    x
    dzdx
    dzdw
    aux
    time
    backwardTime

Here x is the variable value, dzdx the derivative of the CNN with respect to x, dzdw the derivative of the CNN with respect to each of the block parameters, aux space for custom information (e.g. the mask in dropout layers), and time and backwardTime the time spent in the forward and backward pass.

For example, res(1).x is the input of the CNN and res(3).x its output.

The derivative of the CNN can be computed as follows:

res = vl_simplenn(res, data, dzdy)

This performs both a forward and a backward pass. dzdy is a projection applied to the output value of the CNN (see the PDF manual to clarify this point). During training, CNNs are often terminated by a block that computes a single scalar loss value (i.e. res(end).x is a scalar). In this case, one often picks dzdy = 1.

DagNN wrapper

The DagNN wrapper is implemented by the class daggn.DagNN.

Creating a DagNN

A DagNN is an object of class dagnn.DagNN. A DAG can be created directly as follows:

run <MATCONVNETROOT>/matlab/vl_setupnn.m ; % activate MatConvNet if needed
net = dagnn.DagNN() ;

The object is a MATLAB handle, meaning that it is passed by reference rather than by value:

net2 = net ; % both net and net2 refer to the same object

This significantly simplifies the syntax of most operations.

DagNN has a bipartite directed acyclic graph structure, where layers are connected to variables and vice-versa. Each layer receives zero or more variables and zero or more parameters as input and produces zero or more variables as outputs. Layers are added using the addLayer() method of the DagNN object. For example, the following command adds a layer an input x1, an output x2, and two parameters filters and biases.

convBlock = dagnn.Conv('size', [3 3 256 16], 'hasBias', true) ;
net.addLayer('conv1', convBlock, {'x1'}, {'x2'}, {'filters', 'biases'}) ;

Next, we add a ReLU layer on top of the convolutional one:

reluBlock = dagnn.ReLU() ;
net.addLayer('relu1', reluBlock, {'x2'}, {'x3'}, {}) ;

Note that ReLU does not take any parameter. In general, blocks may have an arbitrary numbers of inputs and outputs (compatibly with the block type).

At this point, net contains the two blocks as well as the three variables and the two parameters. These are stored as entries in net.layers, net.vars and net.params respectively. For example

>> net.layers(1)
ans =
             name: 'conv1'
           inputs: {'x1'}
          outputs: {'x2'}
           params: {'filters'  'biases'}
     inputIndexes: 1
    outputIndexes: 2
     paramIndexes: [1 2]
            block: [1x1 dagnn.Conv]

contains the first block. It includes the names of the input and output variables and of the parameters (e.g. net.layers(1).inputs). It also contains the variable and parameter indexes for faster access (these are managed automatically by the DAG methods). Blocks are identified by name (net.layers(1).name) and some functions such as removeLayer() refer to them using these identifiers (it is an error to assign the same name to two layers). Finally, the actual layer parameters are contained in net.layers(1).block, which in this case is an object of type dagnn.Conv. The latter is, in turn, a wrapper of the vl_nnconv command.

The other important data members store variables and parameters. For example:

>> net.vars(1)
ans =
         name: 'x1'
        value: []
          der: []
        fanin: 0
       fanout: 1
     precious: 0

>> net.params(1)
ans =
         name: 'filters'
        value: []
          der: []
       fanout: 1
 learningRate: 1
  weightDecay: 1

Note that each variable and parameter has a value and a derivative members. The fanin and fanout members indicated how many network layers have that particular variable/parameter as output and input, respectively. Parameters do not have a fanin as they are not the result of a network calculation. Variables can have fanin equal to zero, denoting a network input, or one. Network outputs can be identified as variables with null fanout.

Variables and parameters can feed into one or more layers, which results in a fanout equal or greater than one. For variables, fanout greater than one denotes a branching point in the DAG. For parameters, fanout greater than one allows sharing parameters between layers.

Loading and saving DagNN objects

While it is possible to save a DagNN object using MATLAB save command directly, this is not recommended. Instead, for compatibility and portability, as well as to save significant disk space, it is preferable to convert the object into a structure and then save that instead:

netStruct = net.saveobj() ;
save('myfile.mat', '-struct', 'netStruct') ;
clear netStruct ;

The operation can be inverted using the loadobj() static method of dagnn.DagNN:

netStruct = load('myfile.mat') ;
net = dagnn.DagNN.loadobj(netStruct) ;
clear netStruct ;

Note that in this manner the transient state of the object is lost. This includes the values of the variables, but not the values of the parameters, or the network structure.

Using the DagNN to evaluate the network and its derivative

So far, both parameters and variables in the DagNN object have empty values and derivatives. The command:

net.initParams() ;

can be used to initialise the model parameters to random values. For example net.params(1).value should now contain a 3x3x256x16 array, matching the filter dimensions specified in the example above.

The eval() method can now be used to evaluate the network. For example:

input = randn(10,15,256,1,'single') ;
net.eval({'x1', input}) ;

evaluates the network on a random input array. Since in general the network can have several inputs, one must specify each input as a pair 'variableName',variableValue in a cell array.

After eval() completes, the value fields of the leaf variables in the network contain the network outputs. In this example, the single output can be recovered as follows:

i = net.getVarIndex('x3') ;
output = net.vars(i).value ;

The getVarIndex() method is used to obtain the variable index given its name (variables do not move around unless the network structure is changed or reloaded from disk, so that indexes can be cached).

The (projected) derivative of the CNN with respect to variables and parameters is obtained by passing, along with the input, a projection vector for each of the output variables:

dzdy = randn(size(output), 'single') ; % projection vector
net.eval({'x1',input},{'x3',dzdy}) ;

The derivatives are now stored in the corresponding params and vars structure entries. For example, the derivative with respect to the filters parameter can be accessed as follows:

p = net.getParamIndex('filters') ;
dzdfilters = net.vars(p).der ;

Remark: empty values of variables. Note that most intermediate variable values and derivatives are aggressively discarded during the computation in order to conserve memory. Set net.conserveMemory to false to prevent this from happening, or make individual variables precious (net.vars(i).precious = true).