Published September 19, 2025 © Apache-2.0

Run Tiny Language Model (GenAI) on ESP32

Pushing the Boundaries of Edge AI: From Vision to Language(GenAI) on a Microcontroller (ESP32)

IntermediateProtip2 hours366

Run Tiny Language Model (GenAI) on ESP32

Things used in this project

Hardware components

Espressif ESP32-S3

Software apps and online services

Synapedge

Story

Building on our previous demonstration of running YOLOv5n for object detection on an ESP32, we're now showcasing a leap forward in on-device intelligence. Using the SynapEdge compiler, we've successfully deployed a Tiny Language Model (TLM) for natural language understanding on the same ESP32-S3 microcontroller.

SynapEdge achieves this by directly converting standard ONNX models into highly efficient, platform-agnostic C code. This eliminates the need for cloud connectivity, complex software dependencies, or specific hardware accelerators.

This project proves that even the most resource-constrained devices can now host sophisticated AI, opening new frontiers for private, low-power, and intelligent applications at the very edge.

Model

I have built a custom Language model and trained it on "noanabeshima/TinyStoriesV2". This was an easily available dataset, so I picked it up for demonstration. I am looking for real-world applications and data sets, such as IoT, Sensor readings, and alerts. If you have such a dataset, please share.

All code is provided on Git

Prerequisites

An ESP32-S3 module with at least 16MB of flash memory and 8MB of PSRAM. Memory depends on model size.

Setup ESP32

This guide is tailored for the ESP32-S3 Dev (N16R8) Module. Ensure your microcontroller has sufficient flash and RAM for your project. Perform the following steps in your Arduino IDE:

Select the Board:

Select ESP32S3 Dev Module from the board menu.
Go to Tools > Board > ESP32S3 Dev Module.

Configure Settings:

Go to Tools, then:
Set Flash Size to 16MB.
Enable PSRAM.

Edit boards.txt:

Locate the boards.txt file for the ESP32 package. For example:

C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\boards.txt

Replace <your_username> with your actual Windows username.
Open boards.txt in a text editor.
Find the section starting with esp32s3.menu.PartitionScheme.
At the end of this section, add the following lines:

esp32s3.menu.PartitionScheme.My_16MB=16M Flash (15MB APP)
esp32s3.menu.PartitionScheme.My_16MB.build.partitions=My_16MB
esp32s3.menu.PartitionScheme.My_16MB.upload.maximum_size=15728640

Create Partition Table:

Create a file named My_16MB.csv and add the following

# Name,      Type,   SubType,  Offset,   Size,       Flags
nvs,        data,   nvs,      0x9000,   0x5000,
otadata,    data,   ota,      0xe000,   0x2000,
app,        app,    factory,  0x10000,  0xF00000,
ffat,       data,   fat,      0xF10000, 0xE0000,
coredump,   data,   coredump, 0xFF0000, 0x10000,

Save My_16MB.csv in the ESP32 partition folder:

C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\tools\partitions

Replace <your_username> with your actual Windows username.

Restart the IDE:

Close and re-open the Arduino IDE to apply the changes.
Goto tools and select My_16MB in the Partition Scheme (if not present, restart PC).

Compile Model

Use this notebook to compile the model. Use the provided ONNX model in GitHub or build and train your own model as per your requirement. I used the following hyperparameters

vocab_size = 6000
n_embd = 64
n_head = 4
n_layer = 1
block_size = 64
batch_size = 32
learning_rate = 3e-3

Look into GitHub for code

I have used Sentence Piece Tokenizer, Tokenizer.cpp needs a vocab file to encode and decode text, and the Python script is given. Convert your.vocab file to C

Note: The model should be shape inferred. Try to optimize your model

Download files.

Download the files TLM.c, TLM.h, and the weight files, such as TLM_weight_0.h, TLM_weight_1.h, etc, from the notebook. Then, copy all these files into your Arduino sketch folder. After doing this, you should see all the files added as tabs in the Arduino IDE.

Changes

Rename the file TLM.c to TLM.cpp to make it compatible with the Arduino IDE, which expects C++ files.
The ESP32 has a limited amount of internal SRAM. We will use the external PSRAM available on the ESP32 module to handle larger data structures. Therefore, we need to allocate the tensor variables in PSRAM explicitly. Open the TLM.cpp file and add the line #include "esp32-hal-psram.h" at the top of the file.
In the TLM.cpp file, locate the forward pass function forward_pass(). At the beginning of this function. Initialize all tensor unions in PSRAM. For each union, use union tensor_union_0 *tu0 = (union tensor_union_0 *)ps_malloc(sizeof(union tensor_union_0)); to allocate memory for tu0 and so on in PSRAM.
At the end of the forward pass function, release the memory allocated for all tensor unions by using free(tu0); for each tensor union, such as tu0, tu1, etc.
Open the TLM.h header file and comment out all the static union initializations. For example, change static union tensor_union_0 tu0; to //static union tensor_union_0 tu0; to prevent static allocation in internal SRAM.

Tokenizer

Preparing and Tokenization is the first step for any TLM, SLM, LLM

We are using Sentence Piece Tokenizer, these three files: Tokenizer.cpp, Tokenizer.h, and vocab.h are required for tokenization and detokenization. Prepare and process your input as your model requires.

Forward Pass and Post-processing

Find Code Here

Asad Shafi

3 projects • 1 follower

Run Tiny Language Model (GenAI) on ESP32

Things used in this project

Hardware components

Software apps and online services

Story

Model

Setup ESP32

Compile Model

Download files.

Changes

Tokenizer

Forward Pass and Post-processing

Credits

Asad Shafi

Comments

Embed the widget on your own site

Run Tiny Language Model (GenAI) on ESP32

Run Tiny Language Model (GenAI) on ESP32

Things used in this project

Hardware components

Software apps and online services

Story

Model

Setup ESP32

Compile Model

Download files.

Changes

Tokenizer

Forward Pass and Post-processing

Credits

Asad Shafi

Comments

Related channels and tags