Fast integer square root algorithm

This section describes the fast integer square root algorithm used by vl_fast_sqrt_ui8, vl_fast_sqrt_ui16, vl_fast_sqrt_ui32, vl_fast_sqrt_ui64.

Given a non-negative integer $x \in \mathbb{Z}_+$, the goal of this algorithm is to quickly compute the integer approximation of the square root of an integer number:

$y = \max_{\bar y\in\mathbb{Z}} \bar y, \qquad \text{such that}\ \bar y^2 \leq x.$

Consider determining the k-th bit of $y$. To this end, decompose $y$ in three parts:

$y = y_{k+1} + q 2^k + r, \qquad \text{where}\ y_{k+1} \geq 2^{k+1}, r < 2^k,$

and $q\in\{0,1\}$ is the bit to be determined. Here $y_{k+1}$ is a part of the result $y$ that has already been determined, while the bit $q$ and the remainder $r$ are still unknown. Recall that the goal is to find the largest $y^2$ such that $y^2 \leq x$. Expanding $y^2$ this condition becomes

$q (2^{2k} + 2 y_{k+1} 2^k) + r(r + 2q 2^k + 2 y_{k+1}) \leq x - y_{k+1}^2.$

We can now determine if $q=1$ or $q=0$ based on the value of the residual $x - y_{k+1}^2$. Specifically, $q=1$ requires that:

$\boxed{ 2^{2k} + 2a2^k \leq x - y_{k+1}^2. }$

On the other hand, if this equation is satisfied, then setting $r=0$ shows that there exists at least one $y$ such that $q=1$ and $y^2 \leq x$. In particular, greedily choosing $q=1$ in $x=y_{k+1} + 2^k q + r$ is optimal because $2^k > r$. This yields the algorithm:

1. Note that if $x$ is stored in $n$ bits and $n$ is even, then the integer square root $y$ does not require more than $m = n / 2$ bit to be stored. Thus the first bit to be determined is $k \leftarrow m - 1 = n/2 - 1$ and $y_{n/2}=0$.
2. The algorithm stores and updates $y_k/2^{k}$ and $x - y_{k}^2$ for convenience.
3. During iteration $k$, $y_k$ is determined. On entering the iteration, the first step is to compute $y_{k+1}/2^k = 2 y_{k+1}/2^{k+1}$.
4. Then the bound $t = (2^{2k} + 2 y_{k+1})2^k = 2^{2k}(1 + 2 y_{k+1}/2^k)$.
5. If $t \geq x - y_{k+1}$, the $k$-th bit of $y_k$ is set to one. This means applying the update $\hat y_{k}/2^k \leftarrow y_{k+1}/2^k + 1$. This also requires computing $x - y_{k}^2 \leftarrow x - y_{k+1}^2 - t$.
6. Decrement $k \leftarrow k -1$ and, if $k\geq 0$, continue from 3.