Building on our previous demonstration of running YOLOv5n for object detection on an ESP32, we're now showcasing a leap forward in on-device intelligence. Using the SynapEdge compiler, we've successfully deployed a Tiny Language Model (TLM) for natural language understanding on the same ESP32-S3 microcontroller.
SynapEdge achieves this by directly converting standard ONNX models into highly efficient, platform-agnostic C code. This eliminates the need for cloud connectivity, complex software dependencies, or specific hardware accelerators.
This project proves that even the most resource-constrained devices can now host sophisticated AI, opening new frontiers for private, low-power, and intelligent applications at the very edge.
ModelI have built a custom Language model and trained it on "noanabeshima/TinyStoriesV2". This was an easily available dataset, so I picked it up for demonstration. I am looking for real-world applications and data sets, such as IoT, Sensor readings, and alerts. If you have such a dataset, please share.
All code is provided on Git
Prerequisites
- An ESP32-S3 module with at least 16MB of flash memory and 8MB of PSRAM. Memory depends on model size.
This guide is tailored for the ESP32-S3 Dev (N16R8) Module. Ensure your microcontroller has sufficient flash and RAM for your project. Perform the following steps in your Arduino IDE:
Select the Board:
- Select ESP32S3 Dev Module from the board menu.
- Go to Tools > Board > ESP32S3 Dev Module.
Configure Settings:
- Go to Tools, then:
- Set Flash Size to 16MB.
- Enable PSRAM.
Edit boards.txt:
- Locate the boards.txt file for the ESP32 package. For example:
C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\boards.txt
- Replace <your_username> with your actual Windows username.
- Open boards.txt in a text editor.
- Find the section starting with esp32s3.menu.PartitionScheme.
- At the end of this section, add the following lines:
esp32s3.menu.PartitionScheme.My_16MB=16M Flash (15MB APP)
esp32s3.menu.PartitionScheme.My_16MB.build.partitions=My_16MB
esp32s3.menu.PartitionScheme.My_16MB.upload.maximum_size=15728640
Create Partition Table:
- Create a file named My_16MB.csv and add the following
# Name, Type, SubType, Offset, Size, Flags
nvs, data, nvs, 0x9000, 0x5000,
otadata, data, ota, 0xe000, 0x2000,
app, app, factory, 0x10000, 0xF00000,
ffat, data, fat, 0xF10000, 0xE0000,
coredump, data, coredump, 0xFF0000, 0x10000,
- Save My_16MB.csv in the ESP32 partition folder:
C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\tools\partitions
- Replace <your_username> with your actual Windows username.
Restart the IDE:
- Close and re-open the Arduino IDE to apply the changes.
- Goto tools and select My_16MB in the Partition Scheme (if not present, restart PC).
Use this notebook to compile the model. Use the provided ONNX model in GitHub or build and train your own model as per your requirement. I used the following hyperparameters
- vocab_size = 6000
- n_embd = 64
- n_head = 4
- n_layer = 1
- block_size = 64
- batch_size = 32
- learning_rate = 3e-3
Look into GitHub for code
I have used Sentence Piece Tokenizer, Tokenizer.cpp needs a vocab file to encode and decode text, and the Python script is given. Convert your.vocab file to C
Note: The model should be shape inferred. Try to optimize your model
Download files.- Download the files
TLM.c
,TLM.h
, and the weight files, such asTLM_weight_0.h
,TLM_weight_1.h
, etc, from the notebook. Then, copy all these files into your Arduino sketch folder. After doing this, you should see all the files added as tabs in the Arduino IDE.
- Rename the file
TLM.c
toTLM.cpp
to make it compatible with the Arduino IDE, which expects C++ files. - The ESP32 has a limited amount of internal SRAM. We will use the external PSRAM available on the ESP32 module to handle larger data structures. Therefore, we need to allocate the tensor variables in PSRAM explicitly. Open the
TLM.cpp
file and add the line#include "esp32-hal-psram.h"
at the top of the file. - In the
TLM.cpp
file, locate the forward pass functionforward_pass()
. At the beginning of this function. Initialize all tensor unions in PSRAM. For each union, useunion tensor_union_0 *tu0 = (union tensor_union_0 *)ps_malloc(sizeof(union tensor_union_0));
to allocate memory fortu0
and so on in PSRAM. - At the end of the forward pass function, release the memory allocated for all tensor unions by using
free(tu0);
for each tensor union, such astu0
,tu1
, etc. - Open the
TLM.h
header file and comment out all the static union initializations. For example, changestatic union tensor_union_0 tu0;
to//static union tensor_union_0 tu0;
to prevent static allocation in internal SRAM.
Preparing and Tokenization is the first step for any TLM, SLM, LLM
We are using Sentence Piece Tokenizer, these three files: Tokenizer.cpp, Tokenizer.h, and vocab.h are required for tokenization and detokenization. Prepare and process your input as your model requires.
Find Code Here
Comments