To prevent great disappointments when shaping your idea into a project, here a practical guide about deep learning on a Raspberry
Overtime, I have seen so many posts with the most astonishing results. But are these reliable? Traffic in Mumbai for instance. No way this video is made with a Raspberry.
In the end, everyone is interested in one thing: how many FPS? And that figure is mostly not so high. Let me explain this to you.
The basis of every deep learning network is a so-called neural node. Many nodes form a layer. Many layers form a network. Simple. Below a schema of such a node.
It looks complicated, but it is in fact quite simple. Every input has some influence on the output. The extent to which this happens is determined by a weighted multiplication. At the end some activation function shapes the output, limiting the value between -1 and +1. Keep the weights in mind. Together they form the heart of every model. The functionality of every model is stored in these weights.
The problem of running a deep learning network on a Raspberry is twofold.
- The model needs a lot of memory to store all the weight. More than 1 Gbyte of weight is no exception.
- The model needs a lot of computer power to perform all the calculations. This can easily be more the 1000 M FLOPS.
You don't have to be a senior scientist to see that the little CPU on the Raspberry has serious problems with those numbers.
Can we solve these issues? Yes, to a certain extent.
- Reduce the input size of your image. A 256 x 256 picture takes a lot fewer resources than a 1024 x 1024 image. However, most of the time your pre-trained model is fixed sized and can't be reshaped.
- Reduce the number of objects to detect. To put it simply, the more objects, the more neural nodes a network needs. But again, your network is probably already trained and can't be altered in this respect.
- Switch from floating points to 8-bit signed integers. This technique not only reduces the size of weights in memory but also let speeds up the computations. 8-bit multiplications are a lot faster to execute than floating points. It turned out, that neural networks are still performing well using 8-bit numbers. TensorFlow or Caffe follow this strategy.
- Use the GPU of the Raspberry. The GPU has special hardware installed for fast matrix multiplication. These can be used to speed up your network. Of course, your deep learning program must be adapted on forehand for the GPU in the Raspberry. Again, TensorFlow has some routines to this end.
- Probably the best way to solve all issues at once is using a neural computing stick like the Intel Neural Stick 2 or the Google Coral Edge TPU USB accelerator. But this means extra hardware and costs.
Inside the Raspberry chip, you have the CPU and the GPU. Both using the same memory chip (located at the bottom side of the print). This is a good and cheap solution. However, for deep learning calculation, it is not optimal. In that case, you have to transfer your weights in a constant flow to your GPU, without faltering the CPU memory transfers.
Much more information about this topic and the expected FPS on the Raspberry for a wide range of deep learning models can be found here: https://qengineering.eu/deep-learning-with-raspberry-pi-and-alternatives.html
Other boards like the Jetson Nano, JeVois, and Google Coral Dev board are also reviewed. The Google Edge TPU itself is explained here: https://qengineering.eu/google-corals-tpu-explained.html
See also part 2 on Hackster.io: https://www.hackster.io/tinus-treuzel/deep-learning-with-raspberry-pi-explored-part-2-b8e7cc
At last, before anyone asks, the picture above is the Chatuchak weekend market in Bangkok. It just looks like an activation plot of the neural layer.
Thanks for reading.