# Fixed Point Recursive FFT

## Implementation of the forward and inverse FFT using hardware recursion

Posted by Nelson Campos on March 03, 2018

The Fast Fourier Transform is one of the most important operations in Digital Signal Processing and has many applications, for example, in the analysis of communication signals or to perform the Discrete Cosine Transform widely used in audio and image data compression algorithms.

Let $$x[n]$$ be a series of complex signals, for $$0\leq n\leq N-1$$ and $$X[k]$$ the Discrete Fourier Transform of $$x[n]$$, for $$0\leq k\leq N-1$$. The frequency domain signals $$X[k]$$ can be obtained according to equation $$\ref{eq:fft1}$$ or in its matrix form as can be seen in equation $$\ref{eq:fft2}$$, where $$W_N^{kn} = e^{\frac{-2i\pi kn}{N}}$$.

$$\begin{equation} X[k] = \sum_{n=0}^{N-1}x[n]W_N^{kn} \end{equation}\tag{1}\label{eq:fft1}$$ $$\begin{equation}\begin{bmatrix} X\\ \\ \vdots\\ \\ X[N-1] \end{bmatrix} = \begin{bmatrix} W_N^{0\times0} &\dots & W_N^{0\times(N-1)}\\ \\ \vdots &\ddots &\vdots\\ \\ W_N^{(N-1)\times0} & \dots & W_N^{(N-1)^2} \end{bmatrix}\times \begin{bmatrix} x\\ \\ \vdots\\ \\ x[N-1] \end{bmatrix} \end{equation}\tag{2}\label{eq:fft2}$$

The computation of $$X[k]$$ has complexity of $$O(n^2)$$. However, the expression of the equation $$\ref{eq:fft1}$$ may be split in two terms according to equation $$\ref{eq:fft3}$$.

$$\begin{equation}X[k] = \sum_{n=0}^{N-1}x[2n]W_N^{2nk}+\sum_{n=0}^{N-1}x[2n+1]W_N^{(2n+1)k} \end{equation}\tag{3}\label{eq:fft3}$$

Note that applying the properties $$W_N^{2kn} = W_{N/2}^{kn}$$ and $$W_N^{k+N/2} = -W_N^{kn}$$ into equation $$\ref{eq:fft3}$$, equation $$\ref{eq:fft4}$$ is obtained, where $$Xe[k] = \sum\limits_{n=0}^{(N/2)-1}x[2n]W_{N/2}^{nk}$$ and $$Xo[k] = \sum\limits_{n=0}^{(N/2)-1}x[2n+1]W_{N/2}^{nk}$$.

$$\left\{\begin{matrix} X[k] = Xe[k]+W_N^k Xo[k]\\ \\ X[k+N/2] = Xe[k]-W_N^k Xo[k] \end{matrix}\right.\tag{4}\label{eq:fft4}$$

According to equation $$\ref{eq:fft4}$$, if N is a power of 2, the computation of $$X[k]$$ has complexity of $$O(N\times log_2(N))$$.

To retrieve the signals $$x[n]$$ the inverse transform is applied according to the equation $$\ref{eq:fft5}$$.

$$\begin{equation} x[n] = \frac{1}{N}\sum_{k=0}^{N-1}X[k]W_N^{-kn} \end{equation}\tag{5}\label{eq:fft5}$$ Recursive algorithm

A very simple algorithm to compute the FFT can be defined taking advantage of the recursive nature of the FFT, as can be seen in Figure 1. Figure 1: The FFT recursive algorithm

• if the size $$N$$ of the FFT is even then call two FFT of order $$N/2$$, one to compute the Fourier Transform of the signals with even index ($$x[2n]$$) and other to compute the signals with odd index ($$x[2n+1]$$). The Fourier Transform will be scaled with the twiddle factor $$W_N^k$$. Figure 2: N-point FFT split in two (N/2)-point FFT
• if $$N$$ is odd then the FFT will be slowly calculated using the Fourier matrix of equation $$\ref{eq:fft2}$$.

If the length N of the FFT is of the form $$N = m\times 2^p$$ the the complexity of the algorithm of Figure 1 will be $$O(m^2)\times O(p\times log_2(p))$$.

A geometric interpretation of the complexity of the FFT can be seen as follows:

The inverse transform may be obtained using the forward transform according to Figure 4. Figure 4: The inverse FFT

The block to divide by $$N$$ can be computed in hardware according to the algorithm of Figure 5. Figure 5: The divbyN block Software implementation

The forward and inverse FFT algorithm was implemented using Octave (compatible with MATLAB) and the codes can be seen below:

function [Yr, Yi] = butterfly(Xer, Xei, Xor, Xoi, N)

for k=0:N/2-1
wr(k+1) = cos(-2*pi*k/N);
wi(k+1) = sin(-2*pi*k/N);
end

for k=1:N/2
G1(k) = wr(k)*Xor(k)-wi(k)*Xoi(k);
G2(k) = wr(k)*Xoi(k)+wi(k)*Xor(k);
end

for k=1:N/2
Yr(k) = Xer(k)+G1(k);
Yr(k+N/2) = Xer(k)-G1(k);
Yi(k) = Xei(k)+G2(k);
Yi(k+N/2) = Xei(k)-G2(k);
end
end

The butterfly computation (butterfly.m)
function [Yr, Yi] = computeMatrix(xr, xi)
N = length(xr);

for k=0:N-1
for j=0:N-1
Fr(k+1,j+1) = cos(-2*pi*k*j/N);
Fi(k+1,j+1) = sin(-2*pi*k*j/N);
end
end
Yr = zeros(N,1);
Yi = zeros(N,1);
for k=0:N-1
for j=0:N-1
Yr(k+1) = Yr(k+1)+Fr(k+1,j+1)*xr(j+1)-Fi(k+1,j+1)*xi(j+1);
Yi(k+1) = Yi(k+1)+Fr(k+1,j+1)*xi(j+1)+Fi(k+1,j+1)*xr(j+1);
end
end
end

The Fourier matrix computation (computeMatrix.m)
function [Yr, Yi] = FFT(xr, xi)
N = length(xr);
if rem(N,2) == 0
for k=0:N/2-1
xe_r(k+(1))=xr(2*k+(1));
xo_r(k+(1))=xr(2*k+1+(1));

xe_i(k+(1))=xi(2*k+(1));
xo_i(k+(1))=xi(2*k+1+(1));
end

[Xer, Xei] = FFT(xe_r, xe_i);
[Xor, Xoi] = FFT(xo_r, xo_i);
[Yr, Yi] = butterfly(Xer, Xei, Xor, Xoi, N);

else
[Yr, Yi] = computeMatrix(xr, xi);
end
end

The forward FFT (FFT.m)
function [Yr, Yi] = iFFT(xr, xi)
N = length(xr);
[Xi, Xr] = FFT(xi, xr);
Yr = Xr/N;
Yi = Xi/N;
end

The inverse FFT (iFFT.m) Hardware implementation

As many programming languages, SystemVerilog (which is a Hardware Description Language) supports recursion (since Verilog-2001). The forward and inverse FFT parameterized by the length $$N$$ of the signals (in fixed-point format) were implemented using SystemVerilog and the codes can be seen below. Note that the system functions $cos() and$sin() in the butterly and computeMatrix modules are computing known values during elaboration time and therefore the codes are describing synthesizable hardware.

module butterfly #(parameter N=1,
parameter W=1)
(input logic signed [W+N/2-1:0] Xe[N/2][1:0],
input logic signed [W+N/2-1:0] Xo[N/2][1:0],
output logic signed [W+N-1:0] X[N][1:0]);

logic signed [W+N/2:0] G[N][1:0];

localparam theta = -2*$acos(-1)/N; always_comb for(int k=0; k<N/2; k++)begin // Nodes on the butterfly G[k] = ($cos(theta*k)*Xo[k])-($sin(theta*k)*Xo[k]); G[k] = ($cos(theta*k)*Xo[k])+($sin(theta*k)*Xo[k]); // Butterfly computation X[k] = Xe[k]+G[k]; X[k] = Xe[k]+G[k]; X[k+N/2] = Xe[k]-G[k]; X[k+N/2] = Xe[k]-G[k]; end endmodule: butterfly  The butterfly module (butterfly.sv) module computeMatrix #(parameter N=1, parameter W=1) (input logic signed [W:0] x[N][1:0], output logic signed [W+N-1:0] X[N][1:0]); localparam theta = -2*$acos(-1)/N;
// Computation of the Fourier matrix
always_comb
for(int k=0; k<N; k++)begin
X[k][1:0] = {0,0};
for(int j=0; j<N; j++)begin
X[k] = X[k]+($cos(theta*k*j)*x[j])-($sin(theta*k*j)*x[j]);
X[k] = X[k]+($cos(theta*k*j)*x[j])+($sin(theta*k*j)*x[j]);
end
end

endmodule: computeMatrix

The computeMatrix module (computeMatrix.sv)
module invN #(parameter N=1, parameter W=1, parameter BIT_FRAC=1)
(output logic signed [W:0] y);

logic [W:0] A, T;
always_comb begin
A = 0;
T=1<<BIT_FRAC;
for(int k=BIT_FRAC; k>=0; k--)
if(N<<k <= T)begin
A = A+(1<<k);
T = T-(N<<k);
end
y = A;
end

endmodule: invN

The invN module (invN.sv)
include "invN.sv"

module divbyN  #(parameter N=1, parameter W=1)
(input logic signed [W:0] x[N][1:0],
output logic signed [W:0] y[N][1:0]);

logic signed [W:0] inv_N;
invN #(.N(N), .W(W), .BIT_FRAC(BIT_FRAC)) m_invN(.y(inv_N));

int p = $clog2(N); localparam c = isPowerOfTwo(N); if(c) begin always_comb for(int k=0; k<N; k++)begin y[k] = x[k]>>>p; y[k] = x[k]>>>p;; end end else begin always_comb for(int k=0; k<N; k++)begin y[k] = round(x[k]*inv_N); y[k] = round(x[k]*inv_N); end end function logic signed [W:0] round(logic signed [W:0] x); return (x+(1<<<(BIT_FRAC-1)))>>>BIT_FRAC; endfunction: round function logic isPowerOfTwo (input integer N); return (N != 0) && ((N & (N - 1)) == 0); endfunction: isPowerOfTwo endmodule: divbyN  The divbyN module (divbyN.sv) module swap #(parameter N=1, parameter W=1) (input logic signed [W:0] x[N][1:0], output logic signed [W:0] y[N][1:0]); always_comb for(int k=0; k<N; k++)begin y[k] = x[k]; y[k] = x[k]; end endmodule: swap  The swap module (swap.sv) parameter BIT_INT = 16; parameter BIT_FRAC = 16; parameter W = BIT_INT+BIT_FRAC; include "butterfly.sv" include "computeMatrix.sv" include "swap.sv" include "divbyN.sv" module FFT #(parameter N = 1, parameter W=1) (input logic signed [W:0] x[N][1:0], output logic signed [W+N-1:0] X[N][1:0]); logic signed [W:0] xe[N/2][1:0], xo[N/2][1:0]; logic signed [W+N/2-1:0] Xe[N/2][1:0], Xo[N/2][1:0]; always_comb for(int k=0; k<N/2; k++)begin xe[k][1:0] = x[2*k][1:0]; xo[k][1:0] = x[2*k+1][1:0]; end if(!N) begin butterfly #(.N(N), .W(W)) m_butterfly(.Xe(Xe), .Xo(Xo), .X(X)); FFT #(.N(N/2), .W(W)) FFT_e(.x(xe), .X(Xe)); FFT #(.N(N/2), .W(W)) FFT_o(.x(xo), .X(Xo)); end else computeMatrix #(.N(N), .W(W)) Matrix(.x(x), .X(X)); endmodule: FFT module iFFT #(parameter N = 1) (input logic signed [W:0] x[N][1:0], output logic signed [W+N-1:0] X[N][1:0]); logic signed [W:0] x_in[N][1:0]; logic signed [W+N-1:0] Xa[N][1:0]; logic signed [W+N-1:0] Xb[N][1:0]; swap #(.N(N), .W(W)) swap_in(.x(x), .y(x_in)); FFT #(.N(N), .W(W)) mFFT(.x(x_in), .X(Xa)); swap #(.N(N), .W(W+N-1)) swap_out(.x(Xa), .y(Xb)); divbyN #(.N(N), .W(W+N-1)) divN(.x(Xb), .y(X)); endmodule: iFFT module RecursiveFFT #(parameter N = 1, parameter W = 1, parameter sel = 1) (input logic signed [W:0] x[N][1:0], output logic signed [W+N-1:0] X[N][1:0]); if(!sel) FFT #(.N(N), .W(W)) mFFT(.x(x), .X(X)); else iFFT #(.N(N), .W(W)) miFFT(.x(x), .X(X)); endmodule: RecursiveFFT  The FFT and iFFT modules (FFT.sv) timescale 1ns/1ps include "FFT.sv" define SAMPLES 4 parameter N = SAMPLES; module top(); logic signed [W:0] x[N][1:0]; logic signed [W+N-1:0] X[N][1:0]; // Set sel=0 to compute the forward FFT // Set sel=1 to compute the inverse FFT RecursiveFFT #(.N(N), .W(W), .sel(0)) RFFT(.x(x), .X(X)); initial begin for(int k=0; k<N; k++)begin x[k]=k<<<BIT_FRAC; x[k]=(N-1-k)<<<BIT_FRAC; end #1 testFFTinput(.x(x), .X(X));$finish;
end

localparam K=1<<BIT_FRAC;

task testFFTinput(input logic signed [W:0] x[N][1:0],
input logic signed [W+N-1:0] X[N][1:0]);

for(int k=0; k<N; k++)
\$display("x[%0d]=%0f+i*(%0f), X[%0d]=%f+i*(%f)",
k, real'(x[k])/K, real'(x[k])/K,
k, real'(X[k])/K, real'(X[k])/K);
`