The most common operations in digital signal processing are addition, subtraction, multiplication, division and square root. From those operations, the square root is the most difficult to be implemented in hardware because of its complexity.
Let N be a number with n bits as can be seen in equation (1) and let X be the square root of N according to equation (2):
Note that N may be expanded as can be seen in equation (3):
From equation (3) is possible to estimate the square root of N one bit at a time according to the following algorithm: starting from the most significant bit an, each estimated bit of the square root is removed from R and added to a variable P, where P is defined in the equation (4).
So, if the m-1 bits a1, a2, ... ,am-1 have already been estimated, each guess of the square root should satisfy the recursion of the equation (5)
When Xn=0 the exact square root of N have been found so far; otherwise, the sum of the ai gives an approximation of the square root of N.
The flowchart of the square root algorithm can be seen in Figure (1).
Learning from an example
As an example to illustrate how the square root algorithm works, let N=9. Since N is represented with 4 bits, the computation of the square root will be taken in 4 iterations.
If N is not a perfect square number, adding bits to its representation increases the precision of the square root result. For example, for N=6, adding 10 bits of precision to N is equivalent to make N = 6 < < 10 = 6144 and the algorithm will return 78 as the result for sqrt(6144) and once 6 was shifted 10 bits to the left, shifting 5 bits to the right makes the value of the square root be 78/32 = 2.4375 that is the approximate value of sqrt(6).
You want to try by yourself? Check out the SystemVerilog implementation of square root module and its testbench:
Also available in GitHub.
References: Methods of computing square roots - Wikipedia