One of the big takeaways from my work benchmarking the new generation of hardware, designed to run machine learning models at vastly increased speed and inside a relatively low power envelope, was that we might well have started optimising our hardware just a little too soon. This was especially evident when I started looking at Xnor.ai’s new AI2GO framework, and the performance gains using TensorFlow Lite on the new Raspberry Pi 4.
Seeing such large gains from quantisation on low-powered hardware like the Raspberry Pi make Google’s decisions around quantisation and the Edge TPU seem pretty good.
The only platform from the new generation of hardware to make use of quantisation the Coral hardware is ‘best in class,’ out performing the other acceleration platforms—like Intel’s Neural Compute Stick, or NVIDIA’s Jetson Nano—and yesterday Google announced a new generation of image classification models specifically customised for deployment on Edge TPU-based hardware.
Derived from EfficientNets, the new models were tailored to run optimally on the Edge TPU hardware, like the Coral Dev Board or Coral USB Accelerator, yielding higher accuracy without substantially increasing latency.
“Ironically, while there has been a steady proliferation of these architectures in data centers and on edge computing platforms, the NNs that run on them are rarely customized to take advantage of the underlying hardware.”—Suyog Gupta, Google Research
It’s an interesting approach, as the point of designing and building custom hardware to accelerate machine learning such as the Edge TPU, was to optimise in hardware for the operations commonly seen in machine learning models.
Here, Google has taken an opposite approach, and optimised the models for the hardware. Since that hardware was already optimised for those common instructions, it’s likely that to leverage that hardware further the AutoML MNAS framework used to augment the original EfficientNet architecture pushed the model toward those instructions as well as attempting to fit it inside other hardware constraints.
“From past experience, we know that Edge TPU’s power efficiency and performance tend to be maximized when the model fits within its on-chip memory…” — Suyog Gupta, Google Research
Both pre-trained checkpoints of the new EfficientNet-EdgeTPU, and TensorFlow Lite models, are available on GitHub along with instructions on how produce Edge TPU compatible models from the floating point checkpoint using post-training quantisation.
In parallel with this release the Coral compiler has been updated to support models built using post-training quantisation for models that were not created using quantisation-aware training.
With significantly higher accuracy than the MobileNet v2 models that I used during my benchmarking work, but with similar latency, it looks like the new EfficientNet-EdgeTPU could be a significant step to keep Google’s Edge TPU-based hardware ahead of the pack.