### Table of Contents

Dealing with covariant interest point detector requires working a good deal with derivatives, convolutions, and transformations of images. The notation and fundamental properties of interest here are discussed next.

# Derivative operations: gradients

For the derivatives, we borrow the notation of [12] . Let \(f: \mathbb{R}^m \rightarrow \mathbb{R}^n, \bx \mapsto f(\bx)\) be a vector function. The derivative of the function with respect to \(\bx\) is given by its *Jacobian matrix* denoted by the symbol:

\[ \frac{\partial f}{\partial \bx^\top} = \begin{bmatrix} \frac{\partial f_1}{x_1} & \frac{\partial f_1}{x_2} & \dots \\ \frac{\partial f_2}{x_1} & \frac{\partial f_2}{x_2} & \dots \\ \vdots & \vdots & \ddots \\ \end{bmatrix}. \]

When the function \( f \) is scalar ( \(n=1\)), the Jacobian is the same as the gradient of the function (or, in fact, its transpose). More precisely, the **gradient** \(\nabla f \) of \( f \) denotes the column vector of partial derivatives:

\[ \nabla f = \frac{\partial f}{\partial \bx} = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \end{bmatrix}. \]

The second derivative \(H_f \) of a scalar function \( f \), or **Hessian**, is denoted as

\[ H_f = \frac{\partial f}{\partial \bx \partial \bx^\top} = \frac{\partial \nabla f}{\partial \bx^\top} = \begin{bmatrix} \frac{\partial f}{\partial x_1 \partial x_1} & \frac{\partial f}{\partial x_1 \partial x_2} & \dots \\ \frac{\partial f}{\partial x_2 \partial x_1} & \frac{\partial f}{\partial x_2 \partial x_2} & \dots \\ \vdots & \vdots & \ddots \\ \end{bmatrix}. \]

The determinant of the Hessian is also known as **Laplacian** and denoted as

\[ \Delta f = \operatorname{det} H_f = \frac{\partial f}{\partial x_1^2} + \frac{\partial f}{\partial x_2^2} + \dots \]

## Derivative and image warps

In the following, we will often been interested in domain warpings \(u: \mathbb{R}^m \rightarrow \mathbb{R}^n, \bx \mapsto u(\bx)\) of a function \(f(\bar\bx) \) and its effect on the derivatives of the function. The key transformation is the chain rule:

\[ \frac{\partial f \circ u}{\partial \bx^\top} = \left(\frac{\partial f}{\partial \bar\bx^\top} \circ u\right) \frac{\partial u}{\partial \bx^\top} \]

In particular, for an affine transformation \(u = (A,T) : \bx \mapsto A\bx + T\), one obtains the transformation rules:

\[ \begin{align*} \frac{\partial f \circ (A,T)}{\partial \bx^\top} &= \left(\frac{\partial f}{\partial \bar\bx^\top} \circ (A,T)\right)A, \\ \nabla (f \circ (A,T)) &= A^\top (\nabla f) \circ (A,T), \\ H_{f \circ(A,T)} &= A^\top (H_f \circ (A,T)) A, \\ \Delta (f \circ(A,T)) &= \det(A)^2\, (\Delta f) \circ (A,T). \end{align*} \]

# Integral operations: smoothing

In practice, given an image \(\ell\) expressed in digital format, good derivative approximations can be computed only if the bandwidth of the image is limited and, in particular, compatible with the sampling density. Since it is unreasonable to expect real images to be band-limited, the bandwidth is artificially constrained by suitably smoothing the image prior to computing its derivatives. This is also interpreted as a form of regularization or as a way of focusing on the image content at a particular scale.

Formally, we will focus on Gaussian smoothing kernels. For the 2D case \(\bx\in\real^2\), the Gaussian kernel of covariance \(\Sigma\) is given by

\[ g_{\Sigma}(\bx) = \frac{1}{2\pi \sqrt{\det\Sigma}} \exp\left( - \frac{1}{2} \bx^\top \Sigma^{-1} \bx \right). \]

The symbol \(g_{\sigma^2}\) will be used to denote a Gaussian kernel with isotropic standard deviation \(\sigma\), i.e. \(\Sigma = \sigma^2 I\). Given an image \(\ell\), the symbol \(\ell_\Sigma\) will be used to denote the image smoothed by the Gaussian kernel of parameter \(\Sigma\):

\[ \ell_\Sigma(\bx) = (g_\Sigma * \ell)(\bx) = \int_{\real^m} g_\Sigma(\bx - \by) \ell(\by)\,d\by. \]

## Smoothing and image warps

One advantage of Gaussian kernels is that they are (up to renormalization) closed under a linear warp:

\[ |A|\, g_\Sigma \circ A = g_{A^{-1} \Sigma A^{-\top}} \]

This also means that smoothing a warped image is the same as warping the result of smoothing the original image by a suitably adjusted Gaussian kernel:

\[ g_{\Sigma} * (\ell \circ (A,T)) = (g_{A\Sigma A^\top} * \ell) \circ (A,T). \]