Published May 28, 2025 © GPL3+

Embedding ML training on MCU

Can both ML training and inference be done on MCU? This project illustrates how to implement on-device Edge AI with the XIAO ESP32S3 Sense.

IntermediateFull instructions provided1 hour587

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Sense

SD card

Software apps and online services

Arduino IDE

Story

The aim of working on this project was simple: implement both ML training and inference on a microcontroller. I was aware of the challenges but ready to explore the limits.

The source code for this project is available in the GitHub repository: embedding_Edge_AI_on_MCU.

Capturing images, running ML training and inference; all with the XIAO ESP32S3 Sense

I also recorded a short video demonstrating how to setup and train models.

Video demonstrating how to setup and run models

Challenge of implementing edge AI

When writing this documentation, I came across an interesting video from the EDGE AI FOUNDATION on What does TinyML and Edge AI actually mean?

What is a neural network? Simply, a neural network is an algorithm (like a formula) that has been designed to recognize patterns in data. In neural networks, there are two key processes of forward propagation and backpropagation. In forward propagation, an input is processed through the neural network layers and the result is an output which can be a prediction or classification. Backpropagation on the other hand is a complex method where the neural network learns by identifying an error from an input, followed by calculating the derivative of the loss with respect to each weight, and adjusting the weights to reduce the loss. In this case, it is very challenging to implement backpropagation on edge devices such as microcontrollers (MCUs) due to their constrained resources.

Image source: https://medium.com/biased-algorithms/backpropagation-vs-forward-propagation-3dd91aadbe62

Goal of Edge AI (such as TinyML)

Over the recent years, there have been impressive advancements to integrate AI to Internet of Things (IoT) devices which generate huge amounts of data that require fast advanced computing. Considering that edge devices, such as microcontrollers, have not advanced to be able to fully run AI training and inferencing, a new innovative technology known as Edge AI (such as TinyML) and was developed with an aim to run AI models at the edge (inferencing).

In edge AI, trained models are loaded on devices to locally process data and generate outputs such as classification or predictions. There are numerous applications for this technology, such as in smart industries where a device can be installed on the machines and allow identifying anomalies in their operations such as overheating, gas leaks, damage in engines, etc. Below are some use cases that I have worked on in the past:

Inspecting manufacturing products quality with a smart camera
Keeping shelves stocked by automatically counting items with a smart camera
Smart wearable for environment sensing to alert workers of potential threats
Smart wearable for machine sensing to alert when workers have an accident

Solution

Given my passion for embedded systems, I have been researching on how we can train and deploy ML models on a microcontroller. I have experimented with various approaches to this such as developing: simple Artificial Neural Network (ANN), simple visual anomaly detection models, simple regression models, among others.

Various project files created when researching on this project

The biggest challenge to this was evident, the computation power available on microcontrollers is not sufficient. We cannot implement complex data processing and train efficient models on MCUs since they are very constrained especially on flash and RAM. Simple algorithms cannot process data efficiently leading to very poor training. The amount of RAM and flash storage available also limits the size of data that we can compute. However, I saw a light at the end of the tunnel; utilize different environments to run the train and deployment programs.

Following this idea, I experimented with creating custom MicroPython packages to construct and train simple models, but it was not easy as I though, specifically porting relevant NumPy functions that I needed. As I was working with the ESP32S3 (owing to it's high performance and better RAM compared to other MCUs), I decided to experiment with Web AI (client side AI implementation) and this allowed a possibility to achieve the aim of this project.

At the end, I settled on embedding the training and inference instructions on a microcontroller such as the XIAO ESP32S3 Sense. The application of Web AI technology formed the end Edge AI goal of this project:

Run a web server on the XIAO ESP32S3 Sense: The web server's task is to serve both camera feed and web files for training an image classification model and running inference as well; through a web browser.
Save a trained model on the SD card interface of the XIAO ESP32S3 Sense.
Load a model from the SD card interface of the XIAO ESP32S3 Sense, and allow retraining.
Develop this project with a friendly codebase that will allow seamless advancements and customization.

The XIAO ESP32S3 Sense runs as a web server and it sends both the camera feed and a web application to a client (browser) via HTTP. The client is then able to capture images with the XIAO camera and train a simple image classification model using TensorFlow.js, all locally in the browser! Afterwards, the model will be automatically saved to a SD card and the XIAO ESP32S3 Sense will in addition start serving the local model to a client and inference will also start.

Workflow

The choice of training an image classification model was merely from a personal and testing perspective. However, we can seamlessly run a different model for other tasks such as object detection, anomaly detection, as well as working with other data types such as audio, and numerical values. In this case, we can update the source code to collect the required data and load a different model.

Hardware configuration

Software components:

Arduino IDE with ESP32 board and ESPAsyncWebServer library installed

Hardware components:

XIAO ESP32S3 Sense with a 2.4GHz antenna connected
SD card (don't worry about the size, even 100MB is an more than sufficient)
Personal computer

The tiny XIAO ESP32S3 Sense was a perfect choice for this project since it integrates an on-board camera sensor, SD card, 8MB PSRAM and 8MB Flash. At the core of this board is the ESP32S3: Xtensa LX7 dual-core, 32-bit processor that operates at up to 240 MHz while also supporting 2.4GHz Wi-Fi subsystem.

Image source: https://wiki.seeedstudio.com/xiao_esp32s3_getting_started/

Note that the project can also be run on other ESP32 boards but the camera and SD card configurations will need to be updated. To change the camera model, select the correct one in the code XIAO_ESP32S3_Edge_AI.ino. Identify the pin details of your board's SD card interface and in the sketch, set the correct GPIO number for chip select pin in SD.begin() e.g SD.begin(21) if the chip select is connected to GPIO 21.

Project structure

It is impressive how the ESP32 SoC packs high tech features. This project leverages it's ability to run a lightweight web server. But first, what is a web server? Simply, a web server is a system that stores, processes and serves web content to clients such as web browsers. The client makes a request to the browser, for a webpage, image, script, etc., and the server responds accordingly. If, for example, a client makes a request for something that the server cannot find, the server returns an error 404 (the famous one). There are various protocols and codes that both the server and client follow. Using internal Wi-Fi functionality, the ESP32 can connect to a Wi-Fi network (STA mode) or create a network (Access Point mode) allowing other devices to connect to it and make requests.

The project is composed of two main parts: an Arduino sketch and web files. These can be found in the GitHub repository: embedding_Edge_AI_on_MCU.

Two parts of this project: Arduino sketch, web files

The Arduino sketch uses the ESP32 camera functions to capture images, SD card functions to read and write data, and ESPAsyncWebServer to run an asynchronous web server allowing the ESP32 to remain responsive even when serving multiple requests.

The programmed ESP32 will first attempt to connect to the defined Wi-Fi network and if this fails, it will instead create an access point (Wi-Fi hotspot). The SD card and camera will be initialized afterwards. Still in the setup function, I defined some HTTP routes which send a HTML page with images, a TensorFlow.js script (version 4.13.0), live camera feed by streaming JPEG frames over HTTP, and allow downloading and uploading a model to/from the SD card. In the HTTP routes, the images and TensorFlow.js library (which is around 1.4MB) will be cached to improve the performance of the web application whilst reducing network traffic and make the page load faster. The web files are stored on an SD card (the web_ai folder) and served by the ESP32.

HTTP routes defined in the Arduino sketch

The web interface is developed with a simple HTML page that has a dropdown, buttons and shows live feed from the ESP32 camera. In the HTML file, a script first sets the class names from the class_labels variable in the dropdown and later requests a model from the ESP32. If no model is found on the SD card, the ESP returns an error and the script skips running continuous inference. The image processing parameters are defined in the script which by default is RGB 98x96 pixels.

Script in index.html

When a user captures an image, the script creates a Tensor of pixel values (which are the input features) from the image and these are pushed to an array, capturedImages. The active class is also saved to an array, labels.

Processing captured images and their labels

The trainModel function creates a simple Convolutional Neural Network (CNN) that extracts spatial features from the image (conv2d), reduces spatial size of the feature maps (maxPooling2d), flattens the input, creates a fully connected neural network with 16 neurons (dense), followed by an output layer. We can also create other models but this default one was for simplicity and testing. Once the model is trained, it is uploaded to the ESP32 using a HTTP POST request with the body containing model.json and model.weights.bin files which are individually saved to an SD card.

Simple neural network is defined in the script

Setting up

On your PC, ensure that you have installed Arduino IDE, ESPAsyncWebServer library, and the esp32 board.
Open the Arduino sketch XIAO_ESP32S3_Edge_AI.ino and replace both wifi_SSID and wifi_password values with the Wi-Fi network name and password that you want the XIAO board to connect to. You can also define the Access Point (Wi-Fi hotspot) credentials with the variables ap_SSID and ap_password.
If you intend to upload the sketch to a XIAO ESP32S3 Sense (default), ensure that CAMERA_MODEL_XIAO_ESP32S3 is defined (not commented).
On the Arduino IDE ‘Tools’ setting, ensure ‘PSRAM’ is set to ‘OPI PSRAM’. Also select the correct board, ‘XIAO_ESP32S3’ for this project.
Connect the XIAO ESP32S3 Sense to your PC and upload the sketch. Ensure that a 2.4GHz antenna is connected to the board and open a serial interface such as the Serial monitor (set the baud rate to 115200). Next, copy the IP address that the board has been assigned. Disconnect the board from your PC after copying the IP address.

Copy the IP address assigned to the XIAO ESP32 Sense

Open the index.html file and paste the IP address of the board to the esp32S3_ip_address variable.
Copy the folder web_ai to an SD card that has been formatted to FAT32 file system. Once the folder has been copied, place the SD card in the XIAO ESP32S3 Sense SD card slot.

Note: With the XIAO ESP32S3 Sense, you cannot use the SPI functions when utilizing the SD card interface. In this case, you need to connect/solder together the J3 pads on the expansion board for the SD card interface to work. This is illustrated in the image below (source: https://wiki.seeedstudio.com/xiao_esp32s3_sense_filesystem/#card-slot-circuit-design-for-expansion-boards).

Solder/bridge the J3 pads for the SD card interface to work

Train and run models

Once the XIAO ESP32S3 Sense (or other ESP board) has been programmed, SD card inserted and powered, it will connect to the set Wi-Fi network or create an Access Point. If the board is on AP mode, connect to the hotspot. Afterwards, use a PC or a mobile phone to access the Web UI through a browser by entering the IP Address of the board in the URL section. The web UI (completely served by the XIAO ESP32S3 Sense) shows live camera feed, buttons to capture images for the selected class and buttons to train an image classification model. The training and inference logs are also shown on the UI.

If model.json and model.weights.bin files exist on the root of the SD card, the model will be loaded automatically and inference will start on the browser when a client accesses the Web UI. If no model is found on the SD card, the user needs to capture images for each class and train a model.

Access the Web UI on a browser with the ESP's IP address

Note You can easily update the class labels using the class_labels variable in the index.html file. Ensure the size of the class_labels array matches with image_counts_per_class! You can add more classes as well.

Place an object in front of the ESP camera and capture images of the respective class. Do the same process for other classes which can be set by using the dropdown.

Capturing person images on the browser using the XIAO ESP32S3 Sense camera

Once you have captured a considerable number of images, click ‘Train model’. A simple image classification model will be trained and saved on the SD card (note that saving the model.json and model.weights.bin files will take sometime but the serial logs will show the file saving progress).

Dataset built and model training has started on the browser

Model training complete and inference automatically starts

Inference results are also shown on the top part of the page

You can also access the user interface using a mobile phone's browser. In my tests, I trained up to 50 images on both PC and mobile browsers and the training was successful. However, increasing the number of images and the processing parameters such as the image dimensions will require more computation power from the device that you are using.

1 / 2 • Accessing the Web UI with a smartphone and training a model

Conclusion

It is impressive how microcontrollers have advanced, together with software integrations that have been thoughtfully designed to leverage edge computations. The aim of documenting my findings is to share them with the community that I know will build interesting projects with this. Maybe it can be advanced to a framework for Web AI on MCUs?

It is also important to note that there are limitations to this approach, specifically on the device that the clients operate.

I hope this project can lead to innovative use cases and fingers crossed that one day on-device training on microcontrollers will be possible.

Credits

Solomon Muhunyo Githu

17 projects • 28 followers

Hello! I am a tech researcher and innovator with a Bachelor of Science in Mechatronic Engineering. Embedded systems ..that's my thing!

Embedding ML training on MCU

Things used in this project

Hardware components

Software apps and online services

Story

Challenge of implementing edge AI

Goal of Edge AI (such as TinyML)

Solution

Hardware configuration

Project structure

Setting up

Train and run models

Conclusion

Code

embedding_Edge_AI_on_MCU

Credits

Solomon Muhunyo Githu

Comments

Embed the widget on your own site

Embedding ML training on MCU

Embedding ML training on MCU

Things used in this project

Hardware components

Software apps and online services

Story

Challenge of implementing edge AI

Goal of Edge AI (such as TinyML)

Solution

Hardware configuration

Project structure

Setting up

Train and run models

Conclusion

Code

embedding_Edge_AI_on_MCU

Credits

Solomon Muhunyo Githu

Comments

Related channels and tags