Could we package multiple classic game console emulators into a single firmware on a ESP32-S3 device with 320KB of available RAM?
Yes.
https://github.com/geo-tp/Cardputer-Game-Station-Emulators
Ten different systems, ranging from 8-bit consoles to full 16-bit platforms : GameBoy, GameGear, NES, Master System, Megadrive, Super Nintendo (limited), PC-Engine, Lynx, NeoGeo Pocket, WonderSwan
Very quickly, this stopped being an emulation project, and became a battle against memory.
The Constraint That Defined EverythingOn desktop machines, emulator authors rarely worry so much about RAM for emulating retro console. On this platform, we had about: ~320 KB of usable RAM at runtime.
Most emulator cores are written with desktop environments in mind. They allocate small global structures at startup, assuming that memory is plentiful. Lookup tables, caches, and audio workspaces are typically declared up front and left resident for the lifetime of the program. On the ESP32-S3, this model is not viable. Keeping all of those allocations alive simultaneously would exhaust RAM before the user even launched a game
To make this work, the memory representation of each emulator had to be redesigned: how much caches it uses for graphics and audio, nothing is allocated until the moment an emulator is actually started. Large caches were reduced. Static data was moved into flash storage whenever possible.
Emulating Game ConsolesEmulation works by software recreating the original hardware components, CPU, memory map, video renderer, and audio chips, then executing their behavior cycle by cycle while dynamically producing graphics and sound just as the physical machine would. The system is driven by loading the original ROM data, which provides the program code and assets that the emulated hardware reads and executes exactly as it would on the real device.
The smaller 8-bit systems turned out to be a natural fit for this environment. Their original hardware constraints align well with what a ESP32S3 can provide, and I can still afford to allocate a framebuffer while maintaining full-speed emulation. These platforms run comfortably once the allocation strategy is under control, leaving just enough headroom to keep the system stable.
The real difficulty appears with 16-bit machines. A console like the Mega Drive is not defined by a single processor but by several components operating together. Emulating it means reproducing a Motorola 68000, a Z80 coprocessor, multiple sound generators, and a more complex video subsystem. Even with the ESP32-S3’s dual-core architecture, simulating all of these elements simultaneously begins to stretch both processing power and memory capacity.
Memory and CPU ConstraintsWe are constrained not only by memory, but also by CPU availability, which means the emulation loop must remain as lean as possible. Any operation that could introduce latency, display transfers, input handling, or auxiliary processing, has to be carefully offloaded to separate threads so the core emulation timing is never blocked. At the same time, memory must be managed with precision: several emulator cores were reworked to avoid large persistent allocations such as framebuffers, tiles caches, instead generating graphics incrementally and streaming each scanline directly to the display. This combination of tight threading and minimal, purpose-driven memory usage is what allows the systems to run at full speed despite the hardware’s limited resources
Another ConstraintAnother major constraint was ROM storage itself. Since we already lacked sufficient RAM to run the emulation, we certainly couldn’t afford to load entire game images into memory or implement complex caching systems that would stream chunks from the SD card into RAM. Instead, I adopted a different approach: the selected ROM is flashed directly from the SD Card into the available internal flash space of the ESP32-S3. The emulator then accesses it through a direct memory-mapped pointer, allowing the code to read the game data in place without performing any runtime allocation. This design completely avoids consuming precious RAM for ROM storage, the presence of the game does not cost a single byte of usable runtime memory
Eliminating the RAM issue simply moved the constraint elsewhere, onto the device’s internal flash. Because the ROM will be flashed from the SD to the internal flash, that same space must also contain the firmware itself, which includes all emulator cores and support code. To maximize the area available for games, the emulation code had to be reduced as much as possible, removing nonessential features, trimming lookup tables, and carefully selecting what truly needed to reside in flash. In practice, this meant compressing and packaging all ten emulator cores into roughly 2 MB of firmware, leaving the remaining ~6 MB of flash free to host the ROM that will be written from the SD card before launching a game.
FinallyOnce all of these constraints were addressed: tight RAM budgeting, careful CPU scheduling, flash-aware storage, and heavily trimmed emulator cores, I finally reached a stable balance. The system is able to package ten different emulation cores into a single firmware while keeping enough flash space available to load games dynamically from the SD Card. In practice, nearly all supported systems run at full speed on the ESP32-S3, except for the Super Nintendo which can simply not fit.
If you’d like to explore the project or try it yourself, see the Github repo.
Media









Comments