Image processing is one of the common applications for FPGAs. There flexible IO enables interfacing with a range of cameras and sensors from MIPI, Camera Link, parallel and of course DVI / HDMI.
While the parallel nature of the programmable logic enables the image processing pipeline to be implemented directly in parallel increasing determinism and reducing latency.
When it comes to developing image processing algorithms the we can often work in higher level frameworks such as Vitis HLS or MathWorks Simulink.
While seeing the algorithms running in simulation is great, it is always good to see them running on actual hardware.
So I thought I would create a image processing system into which it is easy to drop a new image processing core. This system is able to leverage several MIPI cameras.
The image processing pipeline performs simple processing stages on the image, such as de mosaic and then makes the image available over the Ethernet for examination of algorithm performance.
As the idea is to assess algorithm performance, the image data sent out over the Ethernet link is not compressed.
To enable live video from the camera to be assessed, design will be capable of streaming out a smaller window of the overall image.
Hardware SelectionFor this project we will be using the Tria AUP board which provides us with a Artix UltraScale+ 15P FPGA along with a 2GB of DDR4, 10/100 Ethernet, SFP+ cages and HDMI Rx and Tx.
When it comes to external connectivity there is also a FMC LPC which we will be leveraging.
To enable connectivity with up to two MIPI cameras on the AUP board, we are using the Camera FMC module. This FMC enables up to four MIPI cameras to be connected to AMD development boards, depending upon the board used, due to different IO pinning across boards.
ArchitectureThe overall architecture of the solution can be seen below. This includes the RPI Camera, Demosaic and AXI VDMA which stores the images in a DDR4 frame buffer.
A MicroBlaze V processor will then use LWIP to read the images from the frame buffer and output them over Ethernet, using the AXI Ethernet Lite.
The MicroBlaze V will also be responsible for enabling and configuring the RPI Camera. To do this it uses a AXI IIC and GPIO modules.
Grabbing the frames or displaying and controlling streaming frames is controlled by a Python application running on a client PC.
The project is built around a MicroBlaze V processor, the processor itself is connected to the following IP cores.
- AXI UART Lite - Enables reporting of status
- AXI Ethernet Lite - Connects to host machine at 100 Mbps
- AXI Timer - Needed for Light Weight IP Stack
- AXI GPIO - Controls power enable of the RPI Camera
- AXI IIC - Configures the RPI Camera.
- AXI Interrupt Controller - handles interrupts from IIC, Ethernet Lite and Timer
The complete design is shown below and available on my Github.
The image processing pipeline consists of
- MIPI CSI2 RX Subsystem - This is configured for RAW10 bit pixels and 2 MIPI lanes. The module is also directly enabled in the PL design.
- Demosaic - Configured to convert the RAW10 pixel into a colour channel of 30 bits (RBG)
- AXIS Subset Convertor - This takes in the 30 bit pixel and extracts the 8 most significant bits of the Green channel. The green channel contains most of the illumination intensity, needed for a grey scale solution.
- VDMA - Configured to write into the DDR4 frame Buffers.
Clocking is straight forward the MicroBlaze V is clocked at 300 MHz, output by the DDR MIG and shown in green on the diagram. 200 MHz used by the DPHY and video pipeline, shown in pink in the diagram and 100 MHz shown in blue in the diagram.
The next step is to design the embedded software which runs on the MicroBlaze V.
Embedded SoftwareThe application software has several functions to complete the application, not least the configuration of the RPI Camera and running the LWIP Stack for Ethernet communications.
An overview of the software application can be seen below, it is straight forward in that it initialises the LWIP, and the camera before the image pipeline. Once the pipeline is initialised the design will capture images, and run the main LWIP processor waiting for commands from the Python application.
The application software is available on my GitHub however, there are four main files app.c app.h imx_219_cam.c and imx219_cam.h
imx219_cam.c and imx219_cam.h
Handles the IMX219 camera sensor bring-up over AXI IIC.
The init sequence power-cycles the camera via AXI GPIO and initialises the AXI IIC core. It then verifies the sensor responds to IIC commands by reading the model ID (0x0219).
App.c and App.h
The main application which has five major elements:
Platform Init sets up the AXI Interrupt Controller and connects it to the MicroBlaze-V exception system. No timer is used, lwIP runs in pure polled mode which is sufficient for UDP.
Network Init is called early in the startup sequence because xemac_add blocks while the PHY auto-negotiates. By doing this first, the Ethernet link is established during the time the camera takes to power up and configure. The lwIP RAW API is used with a UDP PCB bound to port 5001.
Camera Init calls IMX219_Init which handles the full power cycle and I2C configuration sequence.
Capture Pipeline Init starts the demosaic first (so the video stream is properly formatted before the VDMA sees it), waits 100ms for it to sync, then configures and starts the VDMA S2MM channel with triple frame buffers in DDR4. If the VDMA reports an IntErr (from catching a partial first frame), it automatically resets and restarts the VDMA to recover.
Main Loop polls xemacif_input for incoming UDP packets.
The UDP frame protocol uses a format with per-packet sequence numbers and byte offsets so the receiver can place data correctly even if packets are dropped.
Each DATA packet carries a 12-byte header, followed by up to 1460 bytes of pixel data. The frame header includes total packet count and snap size so the receiver knows what to expect. Packets are paced at 2ms intervals because the AXI Ethernet Lite has a single TX buffer.
This approach enables a more robust application and prevents corruption of the image beyond a missing packet (which is rare, with the configuration).
Python Application
The Python application is designed to either grab a frame and store it as a PNG or stream images which a selectable region of interest.
When streaming the python app will every 10 seconds grab a full frame image which is updated to enable the region of interest to be updated as the scene changes.
The Python client is a single-file that talks to the MicroBlaze-V over UDP and reassembles the incoming pixel data into viewable images using OpenCV. It has three operating modes designed around the constraints of a 100 Mbps link with a single-buffer Ethernet MAC.
Socket setup is the first thing that happens. The client creates a UDP socket with an 8 MB receive buffer.
The receive engine is the core of the client. It waits for a FRMB header packet, extracts the frame dimensions, total packet count, and snap size, then pre-allocates a zeroed byte array of exactly that size. As DATA packets arrive, each one is placed at its declared byte offset. This is what makes the whole system drop-tolerant.
The key design choice in each mode comes down to the same constraint, the AXI Ethernet Lite's single TX buffer at 100 Mbps can't handle continuous blasting.
GRAB mode solves this with retries, trying up to three times and keeps the attempt that has the highest completeness.
WINDOW mode solves this by reducing the size of frame transmitted and is the fast path. A 128×128 window is only 16 KB about 12 packets.
Stream mode solves this by continually streaming full frames from the image however frame rate is limited.
Running this on hardware with the pythn app shows the application working as we would expect.
Window mode gives circa 20-30 FPS when it is running, which is not bad for a 100 MBps Ethernet Link.
This project demonstrates how we can create a simple frame grabbing solution which enables us to test out image processing algorithms on hardware and store the processed frame for further analysis.
This project has used a pure FPGA based approach. There are extensions we can add to this such as using PCIe for the frame grabbing.
You can find the complete design here











Comments