# Square Root Computation

## A Hardware Algorithm

Posted by Nelson Campos on July 28, 2016

The most common operations in digital signal processing are addition, subtraction, multiplication, division and square root. From those operations, the square root is the most difficult to be implemented in hardware because of its complexity.

Let N be a number with n bits as can be seen in equation (1) and let X be the square root of N according to equation (2):

equation (1) equation (2)

Note that N may be expanded as can be seen in equation (3):

equation (3)

From equation (3) is possible to estimate the square root of N one bit at a time according to the following algorithm: starting from the most significant bit an, each estimated bit of the square root is removed from R and added to a variable P, where P is defined in the equation (4).

equation (4)

So, if the m-1 bits a1, a2, ... ,am-1 have already been estimated, each guess of the square root should satisfy the recursion of the equation (5)

equation (5)

When Xn=0 the exact square root of N have been found so far; otherwise, the sum of the ai gives an approximation of the square root of N.

The flowchart of the square root algorithm can be seen in Figure (1).

Figure (1): Square root algorithm

## Learning from an example

As an example to illustrate how the square root algorithm works, let N=9. Since N is represented with 4 bits, the computation of the square root will be taken in 4 iterations.

example

If N is not a perfect square number, adding bits to its representation increases the precision of the square root result. For example, for N=6, adding 10 bits of precision to N is equivalent to make N = 6 < < 10 = 6144 and the algorithm will return 78 as the result for sqrt(6144) and once 6 was shifted 10 bits to the left, shifting 5 bits to the right makes the value of the square root be 78/32 = 2.4375 that is the approximate value of sqrt(6).

You want to try by yourself? Check out the SystemVerilog implementation of square root module and its testbench:

```module sqrt #(parameter NBITS = 8, PRECISION = 10)
(input logic [NBITS-1:0] A,
input logic clock, reset, oReady, iValid,
output logic [NBITS-1:0]result);

enum logic [1:0] {INIT, SET, CALC, SEND} state;
logic [NBITS+NBITS-1:0] A_aux, Y, P;
logic [4:0] m;

always_ff @(posedge clock)begin
if(reset)begin
A_aux <= '0;
result <= '0;
m <= NBITS-1;
state <= INIT;
oValid <= 0;
end

else begin
case(state)
INIT: begin
oValid <= 0;
A_aux <= A<<<PRECISION;
Y <= '0;
P <= '0;
m <= NBITS-1;
state <= SET;
end

SET: begin
if(m < NBITS)begin
Y <= ((P<<<(m+1)) + (1 <<< (m+m)));
state <= CALC;
end
else begin
oValid <= 1;
state <= SEND;
end
end
CALC: begin
if(Y < A_aux || Y == A_aux)begin
A_aux <= A_aux - Y;
P <= P + (1 <<< m);
result[m] <= 1;
end
else result[m] <= 0;
m <= m-1;
state <= SET;
end

SEND: begin
result <= result >>>(PRECISION-1);
oValid <= 0;
state <= INIT;
end
end
endcase
end
end

endmodule
```
SystemVerilog code for square root algorithm
````include "sqrt.sv"

parameter NBITS = 8, HALF_PRECISION = 5;

module top;

logic clock, reset;
logic [NBITS-1:0] A;
logic [NBITS-1:0]result;
enum logic {S1, S2} state;

initial begin
clock = 0;
reset = 1;
#20 reset = 0;
end

always #5 clock = !clock;

sqrt #(NBITS) SQRT(.*);

always_ff @(posedge clock)begin
if(reset)begin
iValid <= 0;
state <= S1;
end
else case(state)
S1: begin
A = 20'd6;
iValid <= 1;
state <= S2;
end

S2: begin
if(oValid)begin
\$display("sqrt(%1d) = %d/(1<<%1d)", A, result, HALF_PRECISION);
\$finish();
end
end
endcase
end

endmodule
```
SystemVerilog code for square root testbench

Also available in GitHub.