Hackster is hosting Hackster Holidays, Ep. 2: Livestream & Giveaway Drawing. Start streaming on Friday!Stream Hackster Holidays, Ep. 2 on Friday!

Espressif's Ali Hassan Shah Walks Through Putting a TinyML Gesture Recognition Model on an ESP32-S3

Designed around Espressif's ESP-DL, Shah's tutorial walks through the creation, optimization, and deployment of a real-world ML model.

Gareth Halfacree
2 years ago β€’ Machine Learning & AI

Espressif's Ali Hassan Shah has penned a guide to getting started with the company's ESP-DL TinyML edge AI framework, using it to deploy a gesture recognition model on to an ESP32-S3 microcontroller.

"Artificial intelligence transforms the way computers interact with the real world. Decisions are carried by getting data from tiny low-powered devices and sensors into the cloud. Connectivity, high cost and data privacy are some of the demerits of this method," Shah explains. "Edge artificial intelligence [edge AI] is another way to process the data right on the physical device without sending data back and forth improving the latency and security and reducing the bandwidth and power."

To demonstrate the concept, while also showcasing his company's offerings in the field, Shah has put together a step-by-step guide to developing a gesture-recognition model on Google's Colab using the Kaggle Hand Gesture dataset, training it, saving it, and then converting it to the Open Neural Network Exchange (ONXX) format before converting it again into a format suitable for ESP-DL.

Shah's guide then runs through the process of quantizing and optimizing the model, in order to shrink its size and make it suitable for deployment onto the ESP32-S3 microcontroller β€” a device with two Tensilica Xtensa LX7 processor cores running at 240MHz and just 512kB of static RAM (SRAM), putting it orders of magnitude below the power of even an entry-level machine-learning-capable graphics processor or dedicated accelerator.

Finally, Shah walks through deploying the model using ESP-IDF and Visual Studio Code and running it on-device. "The model latency is around 0.7 Seconds on ESP32-S3," Shah notes, "whereas each neuron output and finally the predicted result is shown. In [the] future, we will design a model for ESP32-S3 EYE devkit which could capture images in real time and performs hand gesture recognition."

The full tutorial is available on the Espressif blog, with supporting source code published to GitHub under an unspecified open source license.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles