Integral Image in Hardware

Convolutional Summed Area Tables for Fast Features Computation

Posted by Nelson Campos on April 02, 2017

The Integral Image is one of the most important tools to accelerate features computation in many object detection applications. They are called Summed Area Tables and although they were proposed in 1984 by Frank Crow, it was intensively used within Viola–Jones object detection framework in 2001 [1].

The integral image of a region at the point (x,y) in the summed area table is the sum of all pixels above and to the left of (x,y) including i(x,y), according to equation (1), where i(x,y) is the value of the pixel at the position (x,y).

Equation (1)

The computation of the integral image II(x,y) can be made recursively as can be seen in equation (2):

Equation (2)

Let's try an example to make things clear. Suppose that our image is defined in the matrix of the Figure (1).

Figure (1): a 4x4 matrix

To calculate the integral image at the point (2,1), using equation (2), we have II(2,1) = i(2,1)+II(1,1)+II(2,0)-II(1,1) = 7+14+6-3 = 24, as can be seen in Figure (2).

Figure (2): integral image computed recursively

The advantage of the summed area table is that in the Integral domain the computation effort to calculate the area of a rectangle is just four arrays references, while in the original domain this complexity is mxn sums, where mxn is the number of pixels inside the rectangle.

To illustrate the compuation of the area of the rectangle defined in the region 2<=x<=4, 1<=y<=3, let be S the sum of the rectangle. In the original image, S = 2+3+4+6+7+8+2+3+4=39. However, once the integral image is done, to calculate the same area in the Integral domain the formula S=D+A-B-C = 46+0-0-7=39 can be used as can be seen in Figure (3). This reduces the computational complexity from O(mxn) to O(1).

Figure (3): computaion of the rectangle area

Hardware implementation

The Integral Image II can be defined as a convolution of the matrix S with the image I as can be seen in equation (3):

Equation (3)

Similar to the Sobel Filter, the Integral Image has a structure of sliding window as can be seen in Figure (4).

Figure (4): Circular buffer structure of the integral image

Since the kernel that is being convolved to the image is a matrix 2x2, the circular buffer needs to store just one line of the image.

Figure (5): numerical example of the convolution

The SystemVerilog code of the Integral Image and its testbench can be seen below:

parameter W = 8;
parameter BUFFER_SIZE = 2;
parameter ROW_SIZE = 4;
parameter W_SUM = $clog2(ROW_SIZE*ROW_SIZE*(1<<W));

module integral_image(input logic clock, reset,
                      input logic [W-1:0]new_sample,
                      output logic [W_SUM-1:0] S);

  logic [W_SUM-1:0] s[BUFFER_SIZE][BUFFER_SIZE];
  logic [W-1:0] prevS;
  logic [W_SUM-1:0] buffer[ROW_SIZE];
  logic [$clog2(ROW_SIZE)-1:0] ptr, next_ptr;
  logic [$clog2(ROW_SIZE):0] row_counter;
  always_comb begin
    if(reset) S <= 0;
    else S <= prevS+s[1][0]+s[0][1]-s[0][0];
  always_comb begin
    if(ptr+1 < ROW_SIZE) next_ptr <= ptr+1;
    else next_ptr <= 0;
  always_ff @(posedge clock) begin
      ptr <= 0;
      row_counter <= 0;
      s[0][0] <= 0;
      s[0][1] <= 0;
      s[1][0] <= 0;
      s[1][1] <= 0;
      for(int i = 0; i < ROW_SIZE; i++)
        buffer[i] <= 0;
    else begin
      prevS <= new_sample;
      s[1][0] <= S;
      buffer[ptr] <=  S;
      s[0][1] <= buffer[next_ptr];
      s[0][0] <= s[0][1];
	  ptr <= next_ptr;
      if (ptr == 0) begin
        if(row_counter < ROW_SIZE)
          row_counter <= row_counter+1;
        else row_counter <= 0;
        s[0][0] <= '0;
        s[1][0] <= '0;

  initial begin
    $monitor("dut => new_sample = %d prevS = %d s[1][0] = %d s[0][1] = %d s[0][0] = %d buffer[ptr] = %d S = %d at time = %d (ptr=%d, next_ptr = %d, row_counter = %d)", new_sample, prevS, s[1][0], s[0][1], s[0][0], buffer[ptr], S, $time, ptr, next_ptr, row_counter);

endmodule: integral_image
SystemVerilog code of the Integral Image
module top;
  logic clock, reset;
  logic [W-1:0] X, Y;
  logic [W_SUM-1:0] mem[ROW_SIZE][ROW_SIZE], sum_array[ROW_SIZE][ROW_SIZE];
  int i, j, pos_x, pos_y, WINDOW_SIZE, A, B, C, S;
  always_comb begin
    mem[0][0:3] = {8'd1,8'd2,8'd3,8'd4};
    mem[1][0:3] = {8'd5,8'd6,8'd7,8'd8};
    mem[2][0:3] = {8'd1,8'd2,8'd3,8'd4};
    mem[3][0:3] = {8'd5,8'd6,8'd7,8'd8};
  initial begin
    pos_x = 1;
    pos_y = 0;
    WINDOW_SIZE = 3;
    clock = 0;
    reset = 1;
    reset = 0;
    $display("starting simulation ...");
    //$monitor("X = %d, Y = %d at %d", X, Y, $time);
      for(j=0; j<ROW_SIZE;j++)begin
        @(posedge clock)begin
         X = mem[i][j];
         sum_array[i][j] = Y;
          $display(X, sum_array[i][j]);
      @(posedge clock)
      sum_array[i][j] = Y;
    do_sum(pos_x, pos_y, WINDOW_SIZE);
    $display("%d %d %d %d", A, B, C, S);
  always #5 clock = !clock;
  integral_image m_integral_image(.clock(clock), .reset(reset), .new_sample(X), .S(Y));
  function shift_array();
    for(int k1 = 0; k1<=ROW_SIZE-1;k1++)begin
      sum_array[k1][0:ROW_SIZE-2] = sum_array[k1][1:ROW_SIZE-1];
      sum_array[k1][ROW_SIZE-1] = sum_array[k1+1][0];
    //sum_array[3][3] = Y;
  endfunction: shift_array
  function do_sum(input int pos_x, pos_y, WINDOW_SIZE);
    A = (pos_x >=1 && pos_y >=1)? sum_array[pos_y-1][pos_x-1] : 0;
    B = (pos_y >=1) ? sum_array[pos_x+WINDOW_SIZE-1][pos_y-1] : 0;
    C = (pos_x >= 1) ? sum_array[pos_y+WINDOW_SIZE-1][pos_x-1] : 0;
    S = sum_array[pos_y+WINDOW_SIZE-1][pos_x+WINDOW_SIZE-1]+A-B-C; 
  endfunction: do_sum
endmodule: top
SystemVerilog code of the Integral Image testbench

Play it by yourself on EDA Playground.

Also available in GitHub.