Abdelrhman Mohamed Ibrahim Sayed Abotaleb

Created April 1, 2022 © MIT

Real Time Traffic Signs Classifier

Let's make ADAS more precise and fast.

IntermediateOver 8 days19

Things used in this project

Hardware components

AMD Kria KV260 Vision AI Starter Kit

Software apps and online services

AMD Vivado Design Suite

Story

There are some of proposed traffic signs classifiers , but most of published work targets high end GPUs that consumes more power. Not only the problem with demands of power reduction in embedded systems but the customers may be favorite low cost options. So in this project a more suitable for ADAS (Advanced Driver Assistance Systems) is being introduced without sacrificing the accuracy.

Strategy:

1- Develop the high level model for different options and choose the best candidate to be implemented on FPGA.

2. As we are going to have realtime solution. Implementing the network on CPU part of Xilinx SoC won't be sufficient.

So FPGA flow must be followed.

3. Xilinx DPUs may be a strong option but the problem that it with Kria, it implements integer unit not floating point, and here we will have real concerns about the model accuracy.

So, in this work a complete FPGA code is going to be developed.

The chosen network is LeNet, why because it is very small in model size comparing to other convolutional neural networks, so it can fit on FPGA without utilization problems, keep in mind that we need to have floating point operations as well, and this is a huge headache in terms of multiply, add , or do hyperbolic tanh function (needed in activation layers) over numbers that are represented in IEE-754 format.

This network contains cascading of six layers and the output layer, first three layers are convolutional layers, followed by fully connected layers.
The input image must be of dimension 32x32 and grayscale, it also normalized to have pixels values between 0 and 1 (floating point numbers), which passes through first convolutional layer with 6 feature maps.

Then a dimensionality reduction layer (average pool) is applied and after that a hyperbolic tanh layer.

What is done in this project, is the implementation of each layer on FPGA. Till now the whole network is not implemented at once, so more optimizations are needed to have the full network implemented without need to go forward and back between the host machine and FPGA.

Attached is the tanh layer synthesized and implemented successfully on KRIA and for sake of proof of concept the remaining of the network is implemented on PC and communication is done through ethernet interfacing.

This may mimic the Xilinx DPUs, where a program is being written in CPU part or host machine in case of Alveo boards then it communicates with DPU in programable logic part back and forth,

Code

LeNet Tanh Layer

//The objective is to accelerate tanh layer in LeNet
//For  out_tanhx = Input + 1/3x^3 + 2/15x^5-Input^7*17/315
module tanh
#(parameter width_mantissa=23, parameter FP_Standard_Width = 32, parameter exponent_width = 8, parameter fp_bias=127)
(
    input clk,                               //Main clock to the tanh module 
    input reset,
    input [(FP_Standard_Width-1):0]Input,   //This represents the input of the module
    output reg[(FP_Standard_Width-1):0]out_tanhx    //The computed tanh(Input)
 );

 reg[(FP_Standard_Width-1):0] temp;

 /*   Multiplier Circuity Internal Reg */
 reg [(FP_Standard_Width-1):0]mul_in_1;     // FP Multiplier First Input
 reg [(FP_Standard_Width-1):0]mul_in_2;     // FP Multiplier Second Input
 wire [(FP_Standard_Width-1):0]mulOutput;   //Output of the multiplier
 //Computed term :  17/315*Input^7
 reg [(FP_Standard_Width-1):0]Input_Pwr_7; 
 //Computed term :   2/15*Input^5
 reg [(FP_Standard_Width-1):0]Input_Pwr_5; 
 //Computed term :   1/3*Input&3
 reg [(FP_Standard_Width-1):0]Input_Pwr_3;

 /*   Adder Circuity Internal Reg */
 reg [(FP_Standard_Width-1):0]add_in_1;  // FP Adder First Input
 reg [(FP_Standard_Width-1):0]add_in_2;  // FP Adder Second Input
 wire [(FP_Standard_Width-1):0]addOutput;// Adder Output
 //Counter to count which clock cycle are we in
 reg[8:0] i=0;
 
 always @( clk) 
 begin
   if(clk==1)
     begin 
     //$stop;
    if(reset == 1)  //Check if the reset pin is high to reset the system
    begin
        i=0;
        out_tanhx=32'b00000000000000000000000000000000;
    end
    else if(i==0)
    begin
        mul_in_1=Input;  
        mul_in_2=Input;  
        i=i+1;      
    end
    else if(i==1)
    begin
        mul_in_1=Input_Pwr_3;    //  Input^2 -> first mul input 
        mul_in_2=Input;          //  Inpu    -> second mul input
        i=i+1;
       
    end
    else if(i==2)
    begin
        mul_in_1=Input_Pwr_3;    //Input^3 -> first mnul input
        mul_in_2=Input;          //Input   -> second mul input
        i=i+1;
    end
    else if(i==3)
    begin
        mul_in_1=Input_Pwr_5;    //Input^5 -> first mnul input
        mul_in_2=Input;          //Input   -> second mul input
        i=i+1;
    end
    else if(i==4)
    begin
        mul_in_1=Input_Pwr_5;    //Input^5 -> first mnul input
        mul_in_2=Input;          //Input   -> second mul input
        i=i+1;
    end
    else if(i==5)
    begin
        mul_in_1=Input_Pwr_7;   //Input^7 -> first mnul input
        mul_in_2=Input;          //Input  -> second mul input
        i=i+1;
    end
    else if(i==6)
    begin
        mul_in_1=Input_Pwr_7;    // Input^7 -> first mnul input
        mul_in_2=32'b10111101010111010000110111010001;  // 17/315 -> second mul input
         i=i+1;
    end
    else if(i==7)
    begin
        mul_in_1=Input_Pwr_3;                           // Input^3 -> first mnul input
        mul_in_2=32'b10111110101010101010101010101011;  //Assigning 1/3 -> second mul input
         i=i+1;
    end
    else if(i==8)
    begin
        mul_in_1=Input_Pwr_5;     // Input^5 -> first mnul input
         mul_in_2=32'b00111110000010001000100010001001; //Assigning 2/15 -> second mul input
         i=i+1;
    end
    else if(i==9)
    begin
        add_in_1=Input;          //Input  -> first add input
        add_in_2=Input_Pwr_3;    // 1/3*Input^3  -> second add input
        i=i+1;
    end
    else if(i==10)
    begin
        add_in_1=temp;            // Input+1/3*Input^3 -> first add input
        add_in_2=Input_Pwr_5;    // 2/15*Input^5  -> second add input
        i=i+1;
    end
    else if(i==11)
    begin
        add_in_1=temp;           // Input+1/3*Input^3  -> first add input
        add_in_2=Input_Pwr_7;    // 2/15*Input^5 -> second add input
        i=i+1;
     end
     end 
     if(clk==0)
       begin
           if(i==1)
    begin
        Input_Pwr_3=mulOutput;   // mulOutput = Input^2 -> Input_Pwr_3
    end
    else if (i==2)
    begin
        Input_Pwr_3=mulOutput;   // mulOutput = Input^3 -> Input_Pwr_3
    end
    else if (i==3)
    begin
        Input_Pwr_5=mulOutput;   // mulOutput = Input^4 ->Input_Pwr_5
    end
    else if (i==4)
    begin
        Input_Pwr_5=mulOutput;   // mulOutput =  Input^5 -> Input_Pwr_5
    end
    else if (i==5)
    begin
        Input_Pwr_7=mulOutput;   // mulOutput = Input^6 -> Input_Pwr_5 
    end
    else if (i==6)
    begin
        Input_Pwr_7=mulOutput;   // mulOutput = Input^7 -> Input_Pwr_5
    end
    else if (i==7)
    begin
        Input_Pwr_7=mulOutput;   //e mulOutput= 17/315*Input^7 -> Input_Pwr_5
          temp=Input_Pwr_7; 
        
    end
    else if (i==8)
    begin
        Input_Pwr_3=mulOutput;   //  mulOutput =1/3*Input^3 -> Input_Pwr_3
    end
    else if (i==9)
    begin
        Input_Pwr_5=mulOutput;   //  mulOutput= 2/15*Input^5 ->  Input_Pwr_5
    end
    else if (i==10)
    begin
        temp=addOutput;    //  addOutput = Input+1/3*Input^3 -> out_tanhx
    end
    else if (i==11)
    begin
        temp=addOutput;    //  addOutput= Input+1/3*Input^3+2/15*Input^5 ->  out_tanhx
    end
    else if (i==12)
    begin
        if(Input[FP_Standard_Width-2:(FP_Standard_Width-1)-exponent_width]<8'b10000000&&Input[FP_Standard_Width-2:(FP_Standard_Width-1)-exponent_width]>=8'b01111111)
        begin
        out_tanhx[FP_Standard_Width-2:0]=31'b0111111011001100110011001100110;
        out_tanhx[FP_Standard_Width-1]=Input[FP_Standard_Width-1];
        end
        else if(Input[FP_Standard_Width-2:(FP_Standard_Width-1)-exponent_width]>=8'b10000000)
        begin
        out_tanhx[FP_Standard_Width-2:0]=31'b0111111100000000000000000000000;
        out_tanhx[FP_Standard_Width-1]=Input[FP_Standard_Width-1];
        end
        else
        begin
        out_tanhx=addOutput;     
        end
    end     
       end            
 end
 //The multiplier module
 multiplier_floatingPoint 
 #(.width_mantissa(width_mantissa),.FP_Standard_Width(FP_Standard_Width),.exponent_width(exponent_width), .fp_bias(fp_bias))
 multiplier
 (
 .FP1(mul_in_1),  
 .FP2(mul_in_2),     
 .FP_PRODUCT(mulOutput)   
  );
  
  //The adder module
  adder_floatingPoint  
  #(.width_mantissa(width_mantissa),.FP_Standard_Width(FP_Standard_Width),.exponent_width(exponent_width))
  adder
  (
  .A_FP(add_in_1),     
  .B_FP(add_in_2),   
  .SUM_FP(addOutput)    
  );
endmodule

Credits

Abdelrhman Mohamed Ibrahim Sayed Abotaleb

1 project • 0 followers

RA at McMaster University. More than 12 Years of experience with C, C++, Assembly, VHDL, Verilog, and MATLAB.

Real Time Traffic Signs Classifier

Things used in this project

Hardware components

Software apps and online services

Story

Schematics

schematic_uZk9GvQ3MT.png

Code

LeNet Tanh Layer

Credits

Abdelrhman Mohamed Ibrahim Sayed Abotaleb

Comments

Embed the widget on your own site

Real Time Traffic Signs Classifier

Real Time Traffic Signs Classifier

Things used in this project

Hardware components

Software apps and online services

Story

Schematics

schematic_uZk9GvQ3MT.png

Code

LeNet Tanh Layer

Credits

Abdelrhman Mohamed Ibrahim Sayed Abotaleb

Comments

Related channels and tags