Part 1/2: This blog extends my TensorFlow Lite for Microcontrollers tutorial. I was selected in Google Summer of Code, under TensorFlow, to work on building demos for TinyML and when I read through the documentation, I was intrigued with some of the design decisions made in the Tensorflow Lite for Microcontrollers library. I then read through chapter 11 of the TinyML book which details the inner workings of the Tensorflow Lite Micro framework, and this blog is my interpretation of it. I have also added some of my own suggestions to improve TensorFlow Lite for Microcontrollers. I hope this blog helps you better understand what's happening under the hood and appreciate the tiny details in TensorFlow Lite for Microcontrollers library.
1. Requirements for the Tensorflow Lite for Microcontrollers library
2. Code generation
3. Project generation
4. Conclusion
1. Requirements for the Tensorflow Lite for Microcontrollers libraryThe team behind this framework knew that running in embedded environments imposed a lot of constraints so they identified some important requirements.
1.a No operating system dependencies
An ML model is a black box that takes in numbers and sends out numbers. Access to the rest of the system shouldn’t be necessary to perform these operations. By removing references to files or devices in the basic code, it was possible to port to some of the targeted platforms that don't even have an OS.
1.b No standard C or C++ library dependencies at linker time
The aim was to deploy on devices that might have only a few tens of kilobytes of memory to store a program, so the binary size was very important. Even apparently simple functions like sprintf() can easily take up 20 KB by themselves, so they aimed to avoid anything that had to be pulled in from the library archives that hold the implementations of the C and C++ standard libraries. The one exception to this linker avoidance is the standard C math library, which is relied on for things like trigonometric functions that do need to be linked in.
1.cNo floating point hardware required
Many embedded platforms don’t have support for floating-point arithmetic in hardware, so the code had to avoid any performance-critical uses of floats.
1.dNo dynamic memory allocation
A lot of applications using microcontrollers need to run continuously for months or years. If the main loop of a program is allocating and deallocating memory, it’s very difficult to guarantee that the heap won’t eventually end up in a fragmented state, causing an allocation failure and a crash.
1.e It requires C++11
It’s common to write embedded programs in C, and some platforms don’t have toolchains that support C++ at all, or support older versions than the 2011 revi‐ sion of standard. TensorFlow Lite is mostly written in C++, with some plain C APIs, which makes calling it from other languages easier.
1.f It expects 32-bit processors
There are a massive number of different hardware platforms available in the embedded world, but the trend in recent years has been toward 32-bit processors, rather than the 16-bit or 8-bit chips that used to be common. After surveying the ecosystem, the team decided to focus its development on the newer 32-bit devices because that kept assumptions like the C int data type being 32 bits the same across mobile and embedded versions of the framework.
2. Code generation2.a Introduction to code generation
Code generation involves converting a model directly into C or C++ code, with all of the parameters stored as data arrays in the code and the architecture expressed as a series of function calls that pass activations from one layer to the next.
Interpreting a model is a different approach and relies on loading a data structure that defines the model. The executed code is static; only the model data changes, and the information in the model controls which operations are executed and where parameters are drawn from.
2.b Advantages of code generation
2.b.i Ease of building
Code generation has the benefit of ensuring integrating into build systems simple. You can easily drag and drop a few C or C++ files into practically any IDE and get a project built with little risk for errors if all you have are a few files and no external library dependencies.
2.b.ii Modifiability
With a small amount of code in a single implementation file, it’s much simpler to step through and change the code if you need to, at least compared to a large library for which you first need to establish what implementations are even being used.
2.b.iii Inline data
The data for the model itself can be stored as part of the implementation source code, so no additional files are required. It can also be stored directly as an in-memory data structure, so no loading or parsing step is required.
2.b.iv Code size
If you know what model and platform you’re building for ahead of time, you can avoid including code that will never be called, so the size of the program segment can be kept minimal.
2.c Disadvantages of code generation
2.c.i Upgradability
Consider this scenario: You’ve locally modified the generated code but now, you want to upgrade to a newer version of the overall framework to get new functionality or optimizations. What do you do? You’ll either need to manually hand-pick the latest changes into your local files or regenerate them entirely and try to add them back in your local changes.
2.c.ii Multiple models
It’s difficult to support more than one model at a time through code generation without a lot of source duplication.
2.c.iii Replacing models
Each model is expressed as a mixture of source code and data arrays within the program, so it’s difficult to change the model without recompiling the entire program.
3. Project generation3.aIntroduction to project generation
Project generation is a process that creates a copy of just the source files you need to build a particular model, without making any changes to them, and also optionally sets up any IDE-specific project files so that they can be built easily.
3.b Advantages of project generation
3.b.i Upgradability
Since all of the source files are merely duplicates of the originals from the TensorFlow Lite code base, they all appear in the same location in the folder hierarchy, making it simple to port any local adjustments back to the original source and integrate library updates using common merge tools.
3.b.ii Multiple and replacement models
The underlying code is an interpreter, so you can have more than one model or swap out a data file easily without recompiling.
3.b.iii Inline data
The model parameters themselves can still be compiled into the program as a C data array if needed, and the use of the FlatBuffers serialization format means that this representation can be used directly in memory with no unpacking or parsing required.
3.b.iv External dependencies
All of the header and source files required to build the project are copied into the folder alongside the regular TensorFlow code, so no dependencies need to be downloaded or installed separately.
4. ConclusionI thank my GSoC mentor, Paul Ruiz, for guiding me throughout the project!
Comments