Filters on FPGA
FIR filters
Filter design stage
Floating point vs Integer in digital filters
A chirp generator in Verilog
A low pass FIR filter
Filter forms: Direct, transpose, cascade, etc
Coefficient symmetry advantage
Higher order filters
Conclusion

Published November 16, 2025 © GPL3+

FIR filters on FPGA

Design, test and deploy various FIR filters on FPGA with the MYIR development board

IntermediateProtip1 hour1,345

Things used in this project

Hardware components

MYD-CZU3EG

Software apps and online services

AMD Vivado Design Suite

Story

Filters on FPGA

Because digital filters can be computationally demanding and quite often are used for real-time signals, FPGA are a natural choice for digital filters in demanding conditions.

In this project I'm using the MYIR development board MYD-CZU3EG to develop and test different filter architectures on FPGA. The filters are written in Verilog as well as a chirp generator source file. The filter performance is captured by using Vivado ILA (Integrated Logic Analyzer) and JTAG together with Vivado Hardware Manager.

MYD-CZU3EG with JTAG used in this project

FIR filters

FIR filters are a very common tool in DSP (Digital Signal Processing). The other type of digital filters are IIR. The FIR type is more commonly used due to being intrinsically stable (IIR may not be...).

FIR filters can be lowpass, highpass, bandpass or band-reject (also called notch filter). A FIR filter involves a number of coefficients (N) and the last N signal samples.

Filter design stage

The basic parameters of a filter are the cut frequency and the slope (how sharp it is). There are plenty of calculators online that will give you the parameters for a FIR filter. If you have access to Matlab, that will be all you need. If not, you can use GNU Octave, it's open source, free, runs on Windows and Linux and it's even compatible with Matlab.

When you open octave you need to load the signal package for filters:

pkg load signal

The function used to calculate the filter coefficients is fir1:

B = fir1 (N, W, TYPE, WINDOW, NOSCALE)

Where:

N is the order of the filter (returns N+1 coefficients)
W is the cut frequency (or a vector of two for bandpass, bandreject type). Note that W is a fraction of Fs (the sampling frequency), not an absolute frequency.
TYPE is "low", "high", "stop", "pass", the filyer type. Default is lowpass.
WINDOW is a string indicating the windowing applied. Default is Hamming.
If NOSCALE is "noscale", the coefficients will not be normalized. The default is to normalize coefficients to have a gain 1 in the passband.

To generate the coefficients for a lowpass filter with order 20 and cut frequency 0.2 Fs, simply type:

b = fir1(20, 0.2);

The response can be plotted with:

freqz(b)

Frequency response in the z domain of a FIR lowpass filter

If you want to see the response in the time domain, a chirp function (a signal that increases its frequency) can be used. I generate one with these lines:

k = 0:999;
df = pi/1000;
s = cos(k .*k *df/2);

And plot it with:

plot(s)

Chirp signal used for filter evaluation in the time domain

To apply the filter before (with coefficients in vector b) to this chirp signal (in variable s), you can do:

y = filter2(b, s);

Or just plot the result straight away:

plot(filter2(b,s)

Filter output in the time domain

Floating point vs Integer in digital filters

The FIR filter developed above with Octave uses floating point coefficients and maths. Digital filters in FPGA can also use floats but quite often one can get a very similar result with an integer representation, using notably less resources.

It is common and handy to use a fixed-point representation, where the lower (rightmost) bits represent fractions of unity in binary (1/2, 1/4, 1/8, etc, in a similar way fractional digits represent 1/10, 1/100, 1/1000, etc)

Fixed-point formats are sometimes expressed as qM.N where M and N are the number of bits for the integer and fractional part. The example above is q10.6.

Fixed-point arithmetics can be a bit tricky because there is no warning about overflow and funny things happen... however, it is very useful.

Arithmetics with fixed-point values is like with normal integers, but one has to keep in mind:

1. Addition and subtraction needs at least the same number of fractional bits.For example, to add a q10.6 to a q12.4, one of them must be converted (shifted) to the other's format.

2. Product and quotient will add or subtract the number of fractional bits. For example, a q8.8 multiplied by a q10.6 will produce a q18.24. The same numbers divided will produce a format with 2 fractional bit (8-6)

3. Addition need one more bit for the result. Adding, for example two q10.6 needs a q11.6 for the result. The same applies to subtraction when using signed types (because subtracting from a negative number will make it bigger in absolute value).

4. Product with signed types can use one bit less for the result (because two sign bits merge into one). For example, multiplying two signed q10.6 can be stored in a signed q19.12.

5. Signed and unsigned types cannot be mixed. For example an unsigned q8.8 integer part can represent from 0 to 255 but if signed it will represent from -128 to +127 and operations will fail.

A chirp generator in Verilog

The file chirp.v implements the chirp generator that will be the source for the filters. It can be dropped into a block diagram and connected to an ILA to capture the output. I use the PS as a clock generator, there is no software running on it.

Chirp generator test bench.

After building, programming the FPGA, connecting to ILA and triggering, it looks like that:

Chirp signal (signal above represents frequency)

A low pass FIR filter

Going back to the previous design in Octave, we have these floating point coefficients:

FIR coefficient for the order 20 LP filter

An easy way to convert them to fixed-point is by multiplying by a power of 2 and rounding to integers, here I choose 20 fractional bits:

With these coefficients, the filter is implemented in the fir.v file, where each instantiated stage is in the fir_stage.v file.

The filter can be added to the block diagram and connected to the chirp source and ILA (one additional input).

And again connect to ILA, add the additional signal and trigger:

chirp signal and its low-passed output

Filter forms: Direct, transpose, cascade, etc

Text books present digital filters in different 'forms' or ways to implement them. It's the same filter, just the operations arranged in a different way. The direct (or canonical) form is straight from the mathematical equation:

FIR filter, direct form

However, this implementation is not very practical because it implies that all summations are done at once. The transpose form just brings the delay elements to after the summations:

FIR filter, transpose form

The point here is that math operations (products and sums) in a procedural block already imply a delay. My implementation follows the transpose form.

Finally the cascade form splits de coefficients in groups of order 1 and 2:

FIR filter, cascade form

Coefficient symmetry advantage

Most FIR filters have either symmetric or antisymmetric coefficients. The previous low pass example is symmetric. Antisymmetric would be with values mirrored and with opposite sign.

This can be exploited to halve the number of products, by adding the inputs that are to be multiplied by the same coefficient. Because multipliers are more complex than adders, there is an overall reduction in resources.

Below is an example of the symmetric implementation in direct form. Note how the first and last x value are added before multiplication by h[0], the 2nd and last-but-one also added before h[1], etc.

FIR filter, symmetric direct form

There is also a transpose symmetric form.

Higher order filters

It isn't uncommon for FIR filters to have hundreds of coefficients. To show the performance and the savings due to symmetry, I created an order 200 filter with:

b200=fir1(200, 0.2);

Below is the resource usage for the non-symmetric and symmetric implementations. Note that most of the BRAM usage is due to the ILA. I expected about half of the usage but is not the case... I will look into that later on, it could be that it is using a whole DSP48 cell for the extra adder so what it saves on one hand is used on the other.

FPGA usage for an order 200 FIR filter

The filter performs as expected. Two things to note are:

1. The sharper cutoff, compared with the previous one.

2. The increased delay, always proportional to the filter order.

Order 200 FIR filter response

Conclusion

This is a quick and hopefully easy to follow tutorial on how to design, test and deploy FIR filters on FPGA using Verilog RTL. Filter design is a much wider field and I tried to cover the most common cases as an introduction.

From the files included, one should be able to create similar filters for different specifications. For example, if we were need a passband filter, order 200, with cut frequencies 0.2 Fs and 0.3 Fs, we do this in Octave:

And convert the coefficients to q0.20 with:

Then duplicate one of the verilog files and replace the coefficients there. I have done this in fir_pb_200.v. Then just build the bitstream and run it with the chirp source, here is the result:

Passband filter

Code

`timescale 1ns / 1ps

module chirp(
		input clk,
		input rst,
		input     [ 9:0] i_step,		// cycles to increase dph
		output reg[ 8:0] o_sig,			// output signal, s0q8
		output    [ 7:0] o_freq			// output f/Fs,   u0q8
    );
    reg  [ 7:0] dph;					// delta ph, phase inc per cycle, u6q2
    reg  [ 8:0] ph;						// phase, u7q2
    reg  [ 9:0] count;
    wire [8:0]sin_table[31:0];			// sin table 0 to pi/2 32 steps s0q8
    
    assign o_freq = dph;				// u0q8 (dph/64)
    
    always @(posedge clk) begin
    	if(rst) begin
    		o_sig <= 0;
    		dph   <= 1;					// 0.25
    		ph    <= 0;
    		count <= 0;
    	end else begin
    		if(count==i_step)begin
    			count <= 0;
    			dph <= dph + 1;
    		end else begin
    			count <= count + 1;
    		end
    		
    		ph <= ph + {1'b0, dph};
    		case(ph[8:7])
    			2'd0: o_sig <= sin_table[ph[6:2]];
    			2'd1: o_sig <= sin_table[31-ph[6:2]];
    			2'd2: o_sig <= -sin_table[ph[6:2]];
    			2'd3: o_sig <= -sin_table[31-ph[6:2]];
    		endcase
    	end
    end
    assign sin_table[0] = 0; 
    assign sin_table[1] = 13; 
    assign sin_table[2] = 25; 
    assign sin_table[3] = 37; 
    assign sin_table[4] = 50;
    assign sin_table[5] = 62; 
    assign sin_table[6] = 74; 
    assign sin_table[7] = 86; 
    assign sin_table[8] = 98; 
    assign sin_table[9] = 109;
    assign sin_table[10] = 120; 
    assign sin_table[11] = 131; 
    assign sin_table[12] = 142; 
    assign sin_table[13] = 152; 
    assign sin_table[14] = 162;
    assign sin_table[15] = 171; 
    assign sin_table[16] = 180; 
    assign sin_table[17] = 189; 
    assign sin_table[18] = 197; 
    assign sin_table[19] = 205;
    assign sin_table[20] = 212; 
    assign sin_table[21] = 219; 
    assign sin_table[22] = 225; 
    assign sin_table[23] = 231; 
    assign sin_table[24] = 236;
    assign sin_table[25] = 240; 
    assign sin_table[26] = 244; 
    assign sin_table[27] = 247; 
    assign sin_table[28] = 250; 
    assign sin_table[29] = 252;
    assign sin_table[30] = 254;
    assign sin_table[31] = 255;

endmodule

`timescale 1ns / 1ps

module fir(
		input clk,
		input rst,
		input  wire signed[8:0] i_sigin,		// s0q8
		output reg  signed[8:0] o_sigou			// s0q8
    );
    wire signed [29:0] mac  [20:0];
    wire signed [20:0] coeff[20:0];			// s0q20
    
    assign coeff[0] = -82;
    assign coeff[1] = -2309;
	assign coeff[2] = -6683;
	assign coeff[3] = -12087;
	assign coeff[4] = -12624;
	assign coeff[5] = 551;
	assign coeff[6] = 33879;
	assign coeff[7] = 85648;
	assign coeff[8] = 143893;
	assign coeff[9] = 190180;
	assign coeff[10] = 207841;
	assign coeff[11] = 190180;
	assign coeff[12] = 143893;
	assign coeff[13] = 85648;
	assign coeff[14] = 33879;
	assign coeff[15] = 551;
	assign coeff[16] = -12624;
	assign coeff[17] = -12087;
	assign coeff[18] = -6683;
	assign coeff[19] = -2309;
	assign coeff[20] = -82;
	
    genvar i;
    generate
    	for(i=0; i<21; i=i+1) begin
    		if(i==0)begin 
    			fir_stage st_first(	
    				.clk(clk), .rst(rst), .i_x(i_sigin), 
					.i_b(coeff[0]), .i_a(0), .o_mac(mac[0])
				);
    		end else if(i==20) begin
    			fir_stage st_last(	
    				.clk(clk), .rst(rst), .i_x(i_sigin),  
					.i_b(coeff[i]), .i_a(mac[i-1]), .o_mac(mac[i])
				);
    		end else begin 
    			fir_stage st_mid(
    				.clk(clk), .rst(rst), .i_x(i_sigin), 
					.i_b(coeff[i]), .i_a(mac[i-1]), .o_mac(mac[i])
				);
    		end
    	end
    endgenerate
    
    always @(posedge clk) begin
    	if(rst) begin
    		o_sigou <= 0;
    	end else begin
    		o_sigou <= mac[20][29:21];
    	end
    end
endmodule

`timescale 1ns / 1ps

module fir_lp_200_fold(
		input clk,
		input rst,
		input  wire signed[8:0] i_sigin,		// s0q8
		output reg  signed[8:0] o_sigou			// s0q8
	);
	wire signed [29:0] macf [100:0];
	wire signed [29:0] macb [100:0];
	wire signed [20:0] coeff[100:0];			// s0q20

// b200=fir1(200,0.2);
assign {
	coeff[0], coeff[1], coeff[2], coeff[3], coeff[4], coeff[5], coeff[6], coeff[7], coeff[8], coeff[9], 
	coeff[10], coeff[11], coeff[12], coeff[13], coeff[14], coeff[15], coeff[16], coeff[17], coeff[18], coeff[19], 
	coeff[20], coeff[21], coeff[22], coeff[23], coeff[24], coeff[25], coeff[26], coeff[27], coeff[28], coeff[29], 
	coeff[30], coeff[31], coeff[32], coeff[33], coeff[34], coeff[35], coeff[36], coeff[37], coeff[38], coeff[39], 
	coeff[40], coeff[41], coeff[42], coeff[43], coeff[44], coeff[45], coeff[46], coeff[47], coeff[48], coeff[49], 
	coeff[50], coeff[51], coeff[52], coeff[53], coeff[54], coeff[55], coeff[56], coeff[57], coeff[58], coeff[59], 
	coeff[60], coeff[61], coeff[62], coeff[63], coeff[64], coeff[65], coeff[66], coeff[67], coeff[68], coeff[69], 
	coeff[70], coeff[71], coeff[72], coeff[73], coeff[74], coeff[75], coeff[76], coeff[77], coeff[78], coeff[79], 
	coeff[80], coeff[81], coeff[82], coeff[83], coeff[84], coeff[85], coeff[86], coeff[87], coeff[88], coeff[89], 
	coeff[90], coeff[91], coeff[92], coeff[93], coeff[94], coeff[95], coeff[96], coeff[97], coeff[98], coeff[99], coeff[100]
} = {
	-21'd79,   -21'd211,   -21'd268,   -21'd225,   -21'd92,    21'd85,   21'd242,   21'd319,    21'd276,    21'd120, 
	-21'd102,  -21'd308,   -21'd416,   -21'd369,   -21'd167,   21'd130,  21'd413,   21'd567,    21'd511,    21'd239, 
	-21'd168,  -21'd561,   -21'd778,   -21'd708,   -21'd340,   21'd215,  21'd755,   21'd1058,   21'd969,    21'd477, 
	-21'd271,  -21'd1001,  -21'd1414,  -21'd1305,  -21'd656,   21'd334,  21'd1304,  21'd1859,   21'd1728,   21'd888, 
	-21'd402,  -21'd1670,  -21'd2404,  -21'd2252,  -21'd1181,  21'd474,  21'd2110,  21'd3069,   21'd2898,   21'd1552, 
	-21'd548,  -21'd2636,  -21'd3877,  -21'd3693,  -21'd2018,  21'd622,  21'd3268,  21'd4865,   21'd4677,   21'd2609, 
	-21'd695,  -21'd4036,  -21'd6088,  -21'd5913,  -21'd3367,  21'd764,  21'd4988,  21'd7635,   21'd7499,   21'd4361, 
	-21'd828,  -21'd6208,  -21'd9660,  -21'd9611,  -21'd5716,  21'd885,  21'd7853,  21'd12455,  21'd12585,  21'd7671, 
	-21'd933,  -21'd10248, -21'd16638, -21'd17156, -21'd10769, 21'd973,  21'd14182, 21'd23787,  21'd25284,  21'd16542, 
	-21'd1001, -21'd22218, -21'd39430, -21'd44551, -21'd31626, 21'd1019, 21'd49727, 21'd105988, 21'd158378, 21'd195466, 21'd208856
};
	genvar i;
	generate
		for(i=0; i<101; i=i+1) begin
			if(i==0)begin 
				fir_stage2 st_ini(
					.clk(clk), .rst(rst), .i_x(i_sigin), .i_b(coeff[i]),
					.i_macf(0), .o_macf(macf[0]), .i_macb(macb[i+1]), .o_macb(macb[i])
				);
			end else if(i < 100) begin
				fir_stage2 st_mid(	
					.clk(clk), .rst(rst), .i_x(i_sigin), .i_b(coeff[i]),
					.i_macf(macf[i-1]), .o_macf(macf[i]), .i_macb(macb[i+1]), .o_macb(macb[i])
				);
			end else begin 
				fir_stage2 st_end(	
				.clk(clk), .rst(rst), .i_x(i_sigin), .i_b(coeff[i]),
				.i_macf(macf[i-1]), .o_macf(macb[i]), .i_macb(0), .o_macb()
			);
			end
		end
	endgenerate

	always @(posedge clk) begin
		if(rst) begin
			o_sigou <= 0;
		end else begin
			o_sigou <= macb[0][28:20];
		end
	end
endmodule

`timescale 1ns / 1ps

module fir_pb_200(
		input clk,
	input rst,
	input  wire signed[8:0] i_sigin,		// s0q8
	output reg  signed[8:0] o_sigou			// s0q8
	);
	wire signed [29:0] macf [100:0];
	wire signed [29:0] macb [100:0];
	wire signed [20:0] coeff[100:0];			// s0q20
	
	// b200=fir1(200,[0.2, 0.3]);
	assign {
		coeff[0], coeff[1], coeff[2], coeff[3], coeff[4], coeff[5], coeff[6], coeff[7], coeff[8], coeff[9], 
		coeff[10], coeff[11], coeff[12], coeff[13], coeff[14], coeff[15], coeff[16], coeff[17], coeff[18], coeff[19], 
		coeff[20], coeff[21], coeff[22], coeff[23], coeff[24], coeff[25], coeff[26], coeff[27], coeff[28], coeff[29], 
		coeff[30], coeff[31], coeff[32], coeff[33], coeff[34], coeff[35], coeff[36], coeff[37], coeff[38], coeff[39], 
		coeff[40], coeff[41], coeff[42], coeff[43], coeff[44], coeff[45], coeff[46], coeff[47], coeff[48], coeff[49], 
		coeff[50], coeff[51], coeff[52], coeff[53], coeff[54], coeff[55], coeff[56], coeff[57], coeff[58], coeff[59], 
		coeff[60], coeff[61], coeff[62], coeff[63], coeff[64], coeff[65], coeff[66], coeff[67], coeff[68], coeff[69], 
		coeff[70], coeff[71], coeff[72], coeff[73], coeff[74], coeff[75], coeff[76], coeff[77], coeff[78], coeff[79], 
		coeff[80], coeff[81], coeff[82], coeff[83], coeff[84], coeff[85], coeff[86], coeff[87], coeff[88], coeff[89], 
		coeff[90], coeff[91], coeff[92], coeff[93], coeff[94], coeff[95], coeff[96], coeff[97], coeff[98], coeff[99], coeff[100]
	} = {
		21'd1, -21'd38, 21'd49, 21'd219, 21'd317, 21'd196, -21'd140, -21'd497, -21'd609, -21'd336, 
		21'd202, 21'd675, 21'd762, 21'd390, -21'd198, -21'd611, -21'd605, -21'd261, 21'd93, 21'd174, 
		21'd1, -21'd117, 21'd114, 21'd640, 21'd1007, 21'd699, -21'd363, -21'd1596, -21'd2085, -21'd1266, 
		21'd539, 21'd2222, 21'd2639, 21'd1465, -21'd516, -21'd1975, -21'd2041, -21'd947, 21'd230, 21'd538, 
		21'd2, -21'd408, 21'd264, 21'd1891, 21'd3095, 21'd2296, -21'd787, -21'd4479, -21'd6098, -21'd3957, 
		21'd1091, 21'd5950, 21'd7376, 21'd4377, -21'd979, -21'd5079, -21'd5491, -21'd2723, 21'd410, 21'd1337, 
		21'd1, -21'd1142, 21'd445, 21'd4603, 21'd7922, 21'd6297, -21'd1252, -21'd10754, -21'd15447, -21'd10760, 
		21'd1649, 21'd14268, 21'd18737, 21'd11964, -21'd1409, -21'd12353, -21'd14216, -21'd7616, 21'd564, 21'd3363, 
		21'd0, -21'd3351, 21'd585, 21'd12359, 21'd23031, 21'd20083, -21'd1579, -21'd32256, -21'd51026, -21'd39712, 
		21'd1997, 21'd52110, 21'd77999, 21'd57989, -21'd1639, -21'd67231, -21'd97506, -21'd70466, 21'd631, 21'd73884, 
		21'd104622
	};
	genvar i;
	generate
	for(i=0; i<101; i=i+1) begin
		if(i==0)begin 
			fir_stage2 st_ini(
				.clk(clk), .rst(rst), .i_x(i_sigin), .i_b(coeff[i]),
				.i_macf(0), .o_macf(macf[0]), .i_macb(macb[i+1]), .o_macb(macb[i])
			);
		end else if(i < 100) begin
			fir_stage2 st_mid(	
				.clk(clk), .rst(rst), .i_x(i_sigin), .i_b(coeff[i]),
				.i_macf(macf[i-1]), .o_macf(macf[i]), .i_macb(macb[i+1]), .o_macb(macb[i])
			);
		end else begin 
			fir_stage2 st_end(	
			.clk(clk), .rst(rst), .i_x(i_sigin), .i_b(coeff[i]),
			.i_macf(macf[i-1]), .o_macf(macb[i]), .i_macb(0), .o_macb()
		);
		end
	end
	endgenerate
	
	always @(posedge clk) begin
		if(rst) begin
			o_sigou <= 0;
		end else begin
			o_sigou <= macb[0][28:20];
		end
	end
endmodule

`timescale 1ns / 1ps

module fir_stage2(
		input  clk,
		input  rst,
		input  wire signed [ 8:0] i_x,
		input  wire signed [20:0] i_b,
		input  wire signed [29:0] i_macf,
		input  wire signed [29:0] i_macb,
		output reg signed [29:0] o_macf,
		output reg signed [29:0] o_macb
    );
    
    //wire [29:0] prod;
    
    //assign prod = i_x*i_b;
    
    always @(posedge clk) begin
    	if(rst) begin
    	
    	end else begin
    		o_macf <= i_macf + i_x*i_b;
    		o_macb <= i_macb + i_x*i_b;
    	end
    end
endmodule

Credits

Juan Abelaira

13 projects • 31 followers

Electronics engineer focused on FPGA for accelerated computing and ML but with a wide background in electronics design and software design

FIR filters on FPGA

Things used in this project

Hardware components

Software apps and online services

Story

Filters on FPGA

FIR filters

Filter design stage

Floating point vs Integer in digital filters

A chirp generator in Verilog

A low pass FIR filter

Filter forms: Direct, transpose, cascade, etc

Coefficient symmetry advantage

Higher order filters

Conclusion

Schematics

FIR filters

Code

Chirp file

fir.v

fir_lp_200_fold.v

fir_pb_200.v

fir_stage.v

fir_stage2.v

Credits

Juan Abelaira

Comments

Embed the widget on your own site

FIR filters on FPGA

FIR filters on FPGA

Things used in this project

Hardware components

Software apps and online services

Story

Filters on FPGA

FIR filters

Filter design stage

Floating point vs Integer in digital filters

A chirp generator in Verilog

A low pass FIR filter

Filter forms: Direct, transpose, cascade, etc

Coefficient symmetry advantage

Higher order filters

Conclusion

Schematics

FIR filters

Code

Chirp file

fir.v

fir_lp_200_fold.v

fir_pb_200.v

fir_stage.v

fir_stage2.v

Credits

Juan Abelaira

Comments

Related channels and tags