[F,D] = DFT(I) calculates the Dense Histogram of Gradient descriptors for image I. I must be grayscale in SINGLE format.
A VL_DHOG descriptor is equivalent to a VL_SIFT descriptor (see VL_SIFT() and VLFeat API documentation). This function calculate quickly a large number of such descriptors, for a dense covering of the image with features of the same size and orientation.
The function returns the frames F and the descriptors D. Since all frames have identical size and orientation, F has only two rows (for the X and Y center coordinates). The orientation is fixed to zero. The scale is related to the SIZE of the spatial bins, which by default is equal to 3 pixels (see below). If NS is the number of bins in each spatial direction (by default 4), then a VL_DHOG keypoint covers a square patch of NS by SIZE pixels.
- Remark
The size of a VL_SIFT bin is equal to the magnification factor MAGNIF (usually 3) by the scale of the VL_SIFT keypoint. For instance, the scale that should be fed to VL_SIFTDESCRIPTOR() in order to match the output of VL_DHOG() is equal to VL_SIFT / MAGNIF.
VL_DHOG() accepts the following options:
- Step STEP [1]
Extract a descriptor each STEP pixels.
- Size SIZE [3]
A spatial bin covers SIZE pixels.
- Norm
Append the frames with the normalization factor applied to each descriptor. In this case, F has 3 rows and this value is the 3rd row.
- Fast
Use a flat rather than Gaussian window. Much faster.
- Verbose
Be verbose.