[Y,MASK] = VL_NNDROPOUT(X) applies dropout to the data X. MASK is the randomly sampled dropout mask. Both Y and MASK have the same size as X.

VL_NNDROPOUT(X, 'rate', R) sets the dropout rate to R. Rate is defined as the probability that a variable will be zeroed (i.e. it is one minus the expected value of MASK).

[DZDX] = VL_NNDROPOUT(X, DZDY, 'mask', MASK) computes the derivatives of the blocks projected onto DZDY. Note that MASK must be specified in order to compute the derivative consistently with the MASK randomly sampled in the forward pass. DZDX and DZDY have the same dimesnions as X and Y respectivey.

Note that in the original paper on dropout, at test time the network weights for the dropout layers are scaled down to compensate for having all the neurons active. In this implementation the dropout function itself already does this compensation during training. So at test time no alterations are required.