I recently sat down to benchmark the new accelerator hardware that is now appearing on the market intended to speed up machine learning inferencing on the edge. But, so I’d have a rough yardstick for comparison, I also ran the same benchmarks on the Raspberry Pi. Afterwards a lot of people complained that I should have been using TensorFlow Lite on the Raspberry Pi rather than full blown TensorFlow. They were right, it ran a lot faster.
Yet, there were still some other options still to explore. I’d heard a few rumours about the sort of performance gains people were seeing using the binary weight models from Xnor.ai, and because of this I’d actually intended to include them in my original benchmark. However the company was still in closed testing, and I wasn’t able to get my hands on their frameworks and models in time for my initial benchmark comparison. But, just over a week ago Xnor finally opened their platform as a public beta, and released their AI2GO framework.
So, it was time to compare their framework with TensorFlow…Headline results from benchmarking
Using AI2GO on the Raspberry Pi we see a considerable speed increase over the fastest times times seen with TensorFlow Lite and MobileNet v1 SSD 0.75 depth model in our previous benchmarks.
Comparing the performance of the proprietary Xnor binary convolutional neural networks model to the MobileNet v1 SSD 0.75 depth model, we see a roughly 2× increase in inferencing speed over our TensorFlow Lite result and therefore a 4× increase in speed over our original TensorFlow result.
Our original benchmarks were done using both TensorFlow and TensorFlow Lite on a Raspberry Pi 3, Model B+ without any accelerator hardware. Inferencing was carried out with the MobileNet v2 SSD and MobileNet v1 0.75 depth SSD models, both models trained on the Common Objects in Context (COCO) dataset, converted to TensorFlow Lite.
The Xnor Platform was benchmarked using their ‘medium’ Kitchen Object Detector model. This model is a binary weight network, and while the nature of the training dataset is not known, some technical papers around the model are available.
A single 3888×2916 pixel test image was used containing two recognisable objects in the frame, a banana🍌 and an apple🍎. The image was resized down to 300×300 pixels before presenting it to each model, and the model was run 10,000 times before an average inferencing time was taken. The first inferencing run, which takes longer due to loading overheads in the case of TensorFlow models, was discarded.
Comparing our new result with our original benchmark figures and our previous results from TensorFlow Lite we see that using using the new Xnor binary models on an unaccelerated Raspberry Pi reduces inferencing times to below that seen for the original generation of Movidius hardware from Intel.
Overall we see a 4× reduction in inferencing time compared to our original benchmark numbers for the unaccelerated Raspberry Pi. While these results are still slower than the accelerated hardware they are competitive with both the new generation Movidius hardware from Intel, the Intel Neural Compute Stick 2, and the NVIDIA Jetson Nano using TensorRT optimisation.Some caveats around the results
In our original benchmark, and then again when we went back to look at TensorFlow Lite on the Raspberry Pi, we tried as much as possible to keep our models the same. Comparing apples with apples, rather than apples with bananas. We can’t do that here.
In part this is because what we’re benchmarking here really is bananas rather than apples. Instead of comparing the same model over different hardware, we’re comparing different models on the same hardware, the Raspberry Pi.
Since there is no indication of confidence in the detection results returned by Xnor’s model it is hard to determine exactly how well they are performing beyond the obvious metric that they detected our two objects, a banana🍌 and an apple🍎, in the test frame. That means that potentially we should be comparing the Xnor models with something other than the MobileNet models we used in our previous benchmarks. The performance gap might not be as much as, on the face of it, it appears.
The bounding boxes we get back from the Xnor model is also rather different than those we’d had from TensorFlow models, with looser constraints than you’d expect. This may be tuneable in the model, but right now at least it’s hard to be sure as the internals of the Xnor model are opaque.Summary
The addition on these new benchmarks for the Xnor network hasn’t changed the overall result. The Coral Dev Board and USB Accelerator still have a clear lead, with MobileNet models running between 3× to 4× times faster than the direct competitors.
However the performance increase seen from the Xnor models is startling, with inferencing running 4× faster than our original Raspberry Pi models using TensorFlow. Although pricing for commercial use of the Xnor models is not known, depending on Xnor’s pricing, that could make the Raspberry Pi an extremely competitive platform for machine learning.
ℹ️Information The Xnor.ai model used in this benchmark were intended for evaluation purposes only. The evaluation version has a limit of 13,500 inferences per run. Commercial Xnor.ai models do not contain this limit, however the cost of commercial licensing is not known.
You should go ahead and grab the AI2GO SDK from the Xnor site. You’ll need to accept the terms of service before the Xnor site will allow you to create an account and download their SDK.
The SDK is officially supported either on the Raspberry Pi Zero, running Raspian Stretch Lite, and the Raspberry Pi 3, running Raspbian Stretch. I’m going to be running things on the same Raspberry Pi 3, Model B+, running Raspbian Lite, as we used during the other benchmarks.
ℹ️Information The Xnor Platform SDK also officially supports the Toradex Apalis iMX6, and x86–64 based machines (Haswell or later) with at least 40MB of RAM, running Ubuntu 16.04 or later (with support for GLIBCXX 3.4.21 or above). Although if you look in the samples directory there is also sample code for macOS and a couple of other platforms.
Once you’ve grabbed the SDK we need to create a model bundle. This is where Xnor’s platform starts to heavily diverge from how you’d normally approach machine learning using a framework like TensorFlow. The first thing you’re asked to do is select your hardware.
Once you’ve picked your hardware, you’ll be asked to ‘customise your model bundle’ by selecting an ‘industry.’ It’s this stage where you’re selecting the underlying model that you’ll be using. Regrettably perhaps while Xnor make this selection process really very easy, it’s also pretty opaque.
We’re using the same image as we did for our previous benchmarks, which has two recognisable objects in the frame, a banana🍌 and an apple🍎. So oddly perhaps I’m selecting ‘Smart Home’ as our industry. Then, scrolling down the page, I selected ‘Detection’ and ‘Kitchen Object Detector’ which should give us a model that will detect our banana and apple in the frame.
What isn’t obvious here is what the underlying model architecture is, what datasets it has really been trained against, or really, any other details.
Scrolling down gets some further ‘advanced’ options. It looks like you can trade off accuracy against model memory size and latency. Although it’s still not exactly clear exactly what optimisation you’re doing here.
Like my other benchmarks we’re going to go ahead and do the naive thing here, and download the recommended model.
Once the model has been downloaded you’ll see why we had to specify our hardware up front. The model comes embedded into a shared object compiled library file, rather than as a stand alone model file.
$ ls LICENSE.txt libxnornet.so README.txt xnornet-1.0-cp35-abi3-linux_armv7l.whl
We’ll need to ‘install’ the model alongside the Xnor SDK. Interestingly the
README.txt file gives a list of classes that the model can detect; person, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, potted plant, dining table, microwave, oven, toaster, refrigerator. This doesn’t really give away which dataset it has been trained against, and my initial guess is that like the models their dataset is proprietary rather than one of the publicly available ones.
Go ahead and download the latest release of Raspbian Lite and set up your Raspberry Pi. Unless you’re using wired networking, or have a display and keyboard attached to the Raspberry Pi, at a minimum you’ll need to put the Raspberry Pi on to your wireless network, and enable SSH.
⚠️Warning The Xnor SDK zip file is ~700MB. You’ll need an another ~780MB of additional space to uncompress the archive onto your Raspberry Pi. So it’s probably a good idea to use at least a 32GB card when flashing the image to ensure that you have enough space left after installing the operating system.
Once you’ve set up your Raspberry Pi go ahead and power it on, and then open up a Terminal window on your laptop and SSH into the Raspberry Pi.
% ssh email@example.com
Once you’re logged into your Raspberry Pi go ahead and grab the two zip files, containing the Xnor SDK and our model bundle, from your laptop using
scp and uncompress them into your home directory.
Now go ahead and install the required packages.
$ cd xnor-sdk $ pip3 install Pillow $ pip3 install psutil
⚠️Warning The Xnor documentation suggests using the
requirements.txtfile, via a supplied
install_dependencies_rpi.shscript that they provide to install packages. This simply didn’t work for me, so I installed packages by hand.
After doing this you should similarly install the model bundle,
$ cd ~/kitchen-object-detector-medium-300 $ pip3 install xnornet-1.0-cp35-abi3-linux_armv7l.whl Processing ./xnornet-1.0-cp35-abi3-linux_armv7l.whl Installing collected packages: xnornet Successfully installed xnornet-1.0 $
although be aware that if you’ve previously installed another model bundle, you need to ensure you’ve uninstalled it first before installing a new one.
We should now be ready to run our benchmarking scripts.The benchmarking code
Our benchmarking code is a mash up between our original TensorFlow benchmarks and the sample code provided by Xnor, and is straightforward.
ℹ️Information There doesn’t seem to be a ‘score’ or other indication of accuracy or confidence returned with the bounding box information. We get a labelled box returned for each ‘detected’ object in the frame. However the minimum criteria for a detection is not documented, and there doesn’t seem to be anything in the documentation to suggest how you can discard lower confidence detections.In closing
As I really tried to make clear in my previous articles putting these platforms on an even footing and directly comparing them is actually not a trivial task. While initial results around the Xnor models look extremely promising, the fact that the models are proprietary and therefore somewhat opaque means that you should carefully evaluate them for your specific use case before committing to ongoing licensing charges.
While on the face of it the Xnor benchmarks makes the Raspberry Pi a very competitive platform for machine learning, how useful they prove to be in the real world is going to depend heavily on how much Xnor is charging.Links to previous benchmarks
If you’re interested in more details of around the previous benchmarks.
If you’re interested in getting started with any of the accelerator hardware I used during my first benchmark I’ve put together getting started guides for the Google, Intel, and NVIDIA hardware I looked at during that analysis.