Image processing techniques are used significantly in movies to make actors appear to be in places they cannot be for example space or dangerous situations. It can also be used to replace actors or portions of actors for example the Beast in the live action Beauty and the Beast or the cloak of invisibility in the Harry Potter Series.
The technique to do this at a high level is pretty well known, a solid color is removed from the scene and a new scene is patched in behind to replace it. This is often called green screen or Chroma Key. It can be pretty computationally intensive as multiple operations on required on every pixel in the frame to identify the color, mask it and then stitch together a new frame.
In this project we are going to demonstrate how we can use a Snickerdoodle Black, Xilinx PYNQ and XF:OpenCV libraries from Xilinx to accelerate an image processing application which can be used to accelerate either background or foreground color removal.
Hardware Set UpTo perform this project we are going to use a Snickerdoodle Black mounted on a PI Smasher and connected to a HD Camera over HDMI. Running on the Snickerdoodle is the Snickerdoodle PYNQ 2v5 image and the image processing overlay which was created for a previous project.
We will be using this previously created overlay as the basis for additions in this project and to test out the algorithms prior to implementing them in XF:OpenCV and using HLS.
The hardware set up can be seen below.
To get started with the hardware we need to do the following
- Download the Snickerdoodle PYNQ image
- Burn the PYNQ image to a SD Card and boot the Snickerdoodle
- Connect the Snickerdoodle to the internet
- Install the Snickerdoodle image processing overlay into the running PYNQ system
- Download the Image Processing Overlay Vivado Design
This will provide us two elements, a PYNQ system which we can use to prototype the algorithm quickly using OpenCV and the ability to update the original overlay design with an updated design to accelerate the functionality.
The AlgorithmThe algorithm we are going to used for to remove colors from the foreground and back ground is essentially the same and uses the following steps.
- Capture the image
- Convert the image from the RGB color space to HSV
- Define the Hue, Saturation and Value levels for the color to be detect - Upper and Lower thresholds are used
- Create a mask image which identifies the color for removal as white and areas to keep as black. The mask image will be binary White for the identified color and black where the color is not identified
- Merge the original image with the masked image. Where the value of the mask is white set the merged image to black
- Merge the "fill" image with the mask. Where the value of the mask is black set the merged image to black
- Merge the original image and the fill image to create the final image
As can be seen from the algorithm outlined above it is very computationally intensive, requiring several operations on each pixel in the image multiple times. This will really hit the frame rate when trying to run the application on software processors.
Why us HSV Color Space?One of the challenges of detecting colors is dependence on ambient lighting conditions. As the ambient lighting condition changes depending on the color space being used the performance of the color segmentation algorithm will change significantly.
HSV uses the Hue (wavelength) Saturation and Value to define color, the hue is represented as a 360 degree circle (180 in OpenCV).
As HSV separates the color information (Hue) from the intensity (Value) it is possible to detect colors using hue and saturation without the lighting (intensity) having an impact.
PYNQ AlgorithmIn python the algorithm is implemented as using the code below
frame_camera = cam_vdma.readchannel.readframe()
frame_color=cv2.cvtColor(frame_camera,cv2.COLOR_BGR2RGB)
hsv = cv2.cvtColor(frame_camera,cv2.COLOR_RGB2HSV)
lower_green = np.array([30, 120, 50]) #hue saturation value
upper_green = np.array([102, 255, 255])
mask = cv2.inRange(hsv, lower_green, upper_green) # needs to be BGR not RGB as for saved image
masked_image = np.copy(frame_color)
masked_image[mask != 0] = [0, 0, 0]
background_image = cv2.imread('Mars.jpg')
background_image = cv2.cvtColor(background_image, cv2.COLOR_BGR2RGB)
crop_background = background_image[1080:1592, 0:1280]
crop_background[mask == 0] = [0, 0, 0]
final_image = crop_background + masked_image
This produces the images as staged below
Input image to remove green screen
Mask image created from the detection of the green
Original Image with the green set to black from masking
Back ground image processed to remove the non masked area
Merging the images
This same algorithm can also be updated to remove different elements not just the back ground. For example to create an invisibility cloak all we need to do is first capture the scene with nothing in the screen, such that we have the back ground.
Then we need to define the element we want to mask out, in this case a blue blanket.
Updating the HSV thresholds to detect blue and not green we can create a new mask - Note there is more blue in the black ground sadly
Again merge the captured image and the mask
Update the background to include the elements required
Merge the two files to produce the invisibility cloak
We can record this as video using the PYNQ frame work to save an AVI of the video in real time for download.
Acceleration using XF:OpenCVTo be able to accelerate the algorithm we want to create a HLS IP block which leverages the Xilinx XF:OpenCV libraries. These have been updated to the Vitis Vision Libraries now however, as we are using PYNQ 2v5 we need to use the compatible version.
You can clone the XF:OpenCV Library here
Once these have been downloaded we can create new Vivado HLS project which calls up these libraries. IF you are familiar with Open CV they work in much a similar manner except we are going to be implementing the function in programmable logic.
The module I have developed will accelerate the creation of the map as that requires significant processing on each pixel in the frame.
To get started we need to create the file for acceleration and a test bench files. As we are using these in the stand alone HLS and not part of SDSoC we need to ensure the files we use are configured correctly for compilation. As such each source and test bench CPP file should have the following CFLAG
-D__SDSVHLS__ -IC:/GIT/xfopencv/include --std=c++0x
The include path should be the location of your cloned directory, of course.
To make use of the XFOpenCV libraries we call the necessary header file from the include directory.
For the interfacing we want to be use AXI stream as such the first and last thing the block must do is convert bewteen AXI Stream and XF OpencCV Matrix.
As the input stream is color and the output greyscale we have 24 bit input AXI Stream and a 8 bit AXI Stream output.
We can then define a number of mat's for the images at different stages of the algorithm such that we can establish a data path between them
void ip_accel_app(hls::stream< ap_axiu<24,1,1,1> >& _src,hls::stream< ap_axiu<8,1,1,1> >& _dst,int height,int width)
{
#pragma HLS INTERFACE axis register both port=_src
#pragma HLS INTERFACE axis register both port=_dst
xf::Mat<IN_TYPE, HEIGHT, WIDTH, NPC1> imgInput1(height,width);
xf::Mat<IN_TYPE, HEIGHT, WIDTH, NPC1> imgOutput1(height,width);
xf::Mat<IN_TYPE, HEIGHT, WIDTH, NPC1> _bgr2hsv;
xf::Mat<MASK_TYPE, HEIGHT, WIDTH, NPC1> _inrange;
unsigned char lower_thresh[3];
unsigned char upper_thresh[3];
lower_thresh[0]=94;
lower_thresh[1]=20;
lower_thresh[2]=15;
upper_thresh[0]=126;
upper_thresh[1]=255;
upper_thresh[2]=255;
#pragma HLS stream variable=imgInput1.data dim=1 depth=1
#pragma HLS stream variable=imgOutput1.data dim=1 depth=1
#pragma HLS stream variable=_bgr2hsv.data dim=1 depth=1
#pragma HLS stream variable=_inrange.data dim=1 depth=1
#pragma HLS dataflow
xf::AXIvideo2xfMat(_src, imgInput1);
xf::bgr2hsv<IN_TYPE,HEIGHT, WIDTH,NPC1 >(imgInput1, _bgr2hsv);
xf::inRange<IN_TYPE,MASK_TYPE,HEIGHT, WIDTH,NPC1>(_bgr2hsv, lower_thresh,upper_thresh,_inrange);
xf::xfMat2AXIvideo(_inrange, _dst);
}
Once the Accelerated RTL is completed we need to develop a test bench to test the algorithm. This can be used for the C simulation and the Co Simulation
int main(int argc, char** argv)
{
if(argc != 2)
{
fprintf(stderr,"Invalid Number of Arguments!\nUsage:\n");
fprintf(stderr,"<Executable Name> <input image path> \n");
return -1;
}
cv::Mat out_img,ocv_ref;
cv::Mat in_img,in_img1;
// reading in the color image
in_img = cv::imread(argv[1], 1);
if (in_img.data == NULL)
{
fprintf(stderr,"Cannot open image at %s\n", argv[1]);
return 0;
}
// create memory for output images
in_img1.create(in_img.rows,in_img.cols,CV_8UC1);
uint16_t height = in_img.rows;
uint16_t width = in_img.cols;
hls::stream< ap_axiu<24,1,1,1> > _src;
hls::stream< ap_axiu<8,1,1,1> > _dst;
cvMat2AXIvideoxf<NPC1>(in_img, _src);
ip_accel_app(_src, _dst,height,width);
AXIvideo2cvMatxf<NPC1>(_dst, in_img1);
cv::imwrite("hls.jpg", in_img1);
return 0;
}
Using images captures from the PYNQ development we can run the simulation and co simulation to see the results.
Output image
Once we are happy with the mask creation we can run synthesis and package the design as an IP to include in Vivado
Vivado Build
In Vivado we need to first add the exported IP to the repository
Once added it should be available under the Vivado HLS IP
Add in the IP and create a new VDMA path, this will allow the mask to be completed at the same time the image is passed through the system.
The resource utilization and power estimation of the implemented design is pretty impressive. There is still resource to be able to add more functionality on the PL
Once the bit file is completed we can upload it to the Snickerdoodle and to see it working in action.
This has been a fun project which has shown how we can quickly and easily create solutions which leverage the power of the Zynq, PYNQ and HLS.
Future work would be to look at the edge effects which are clearly present.
Comments