This compare born from the development of three IoT projects:
- Arduino/Genuino 101: Build an Activity Recognition Device
- Monitoring Workers' stress levels
- Urban Plant Watering using Arduino/Genuino 101
All three projects exploit Machine Learning technologies to predict new information from the system based on some previously trained data.
In the first project, the Intel Pattern Matching Engine contained in the Genuino 101 board is used to recognize the activity of a person using data from accelerometer and gyroscope. In the second one, TensorFlow Lite is used to creating a machine learning model able to predict the stress status of a worker based on his heart rate. And in the third one, Intel Pattern Matching Engine in the Arduino 101 board is used to record temperature data over the course of a day and attempts to classify the optimum condition, 25 degrees Celsius, amongst this data. The result becomes the next day's time for watering the plant.
The goal of this post is to describe these technologies, compare them in order to identify differences and similarities and understand which is the best in which situation.
The Genuino 101 board is equipped with an Intel Curie Compute Module, which contains the Pattern Matching Engine.
The PME is a hardware engine capable of learning and recognizing patterns in arbitrary sets of data. Making it possible to implement machine-learning pattern matching or classification algorithms which are accelerated by the pattern-matching capabilities of the PME.
Basic Functions Supported by the library provided by Intel:
- Learning Patterns
- Recognizing Patterns
- Storing Pattern Memory (Knowledge)
- Retrieving Pattern Memory (Knowledge)
Some information about the Pattern Matching Engine architecture.
The Pattern Matching Engine (PME) is a parallel data recognition engine with the following features:
- 128 parallel Processing Elements (PE) each with 128 byte input vector
- Support for up to 32, 768 Categories
- Two Classification Functions: k-nearest neighbors (KNN) and Radial Basis Function (RBF)
Basically the Curie Module is capable of storing a 128 byte vector in each one of the 128 neurons, and is able to classify new instances buy using the two aforementioned classification functions (only one function can be selected at startup).
The best results, basing on our experience with the board for another project, were achieved by using the RBF classification.
Due to the fact that the recognition is made by dedicated hardware, the performances are great for a device with such low hardware specifications, and also the results that can be achieved are good. Even with no many data, it is able to classify correctly new instances with a low margin of error.
But has to be said also that given the small amount of vectors that it can store (upper bounded by the total number of neurons) it is possible that this will not be the optimal solution in case we would like to have more control of the learning algorithm used and especially it will not be the right solution if you have the need to use a large dataset.
TensorFlow is an open source software library for machine learning, which provides tested and optimized modules useful in the implementation of algorithms for different types of perceptive tasks and language comprehension.
Operating systems supported by TensorFlow:
TensorFlow provides native APIs in:
- C/ C ++
- And other languages.
Two of the main APIs provided by TensorFlow are Keras and EagerExecution.
Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:
- User friendlyKeras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.
- Modular and composableKeras models are made by connecting configurable building blocks together, with few restrictions.
- Easy to extendWrite custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art mode.
TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. Eager execution is a flexible machine learning platform for research and experimentation, providing:
- An intuitive interfaceStructure your code naturally and use Python data structures. Quickly iterate on small models and small data.
- Easier debuggingCall ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.
- Natural control flowUse Python control flow instead of graph control flow, simplifying thespecification of dynamic models.
uTensor is an extremely light-weight machine learning inference framework built on Mbed and Tensorflow. It converts ML models to C++ source files, ready to be imported into MCU projects. A model is constructed and trained in Tensorflow. uTensor takes the model and produces a.cpp and.hpp file. These files contains the generated C++ code needed for inferencing. Why generate C++ source files? Because they are human-readable and can be easily edited for a given application.
uTensor should support any Mbed enabled board that has sufficient memory:
- 128+kB RAM
- 512kB+flash memory.
uTensor is designed to be the interface between embedded engineers and data scientists alike. It is young and undergoing rapid development. Many exciting features are on the way:
- CMSIS-NN integration
- Smaller binary
- More efficient Run-time
We can find some relevant differences between the two technologies:
- Hardware requirements and performances
- Easy of use
The first point regards the possibility offered by the two technologies, and here uTensor is obviously the winner since it allows to build a recognition model directly from TensorFlow running on a computer and load it into the Mbed board. In this way, we can build the model using a powerful machine and a big dataset, without having constraints on the memory limits of the board in which the model is deployed. There is also more control over the machine learning algorithm used so the model can be adapted to the needs of the specific problem.
For the second point, we should first make some considerations since we are comparing a hardware technology (Intel Quark inside the Curie Module) with a software application (uTensor). This means that for the Pattern Matching we should own an Intel Quark Microcontroller, instead, to be able to run uTensor inside a Mbed board we should respect the minimum requirements described above.
For this reasons the Genuino Board results in minor hardware specifications, since the module for the Pattern Matching is a piece of hardware dedicated to that job, and also in better performances (concerning the time required, not the accuracy).
Finally, the ease of use has an absolute winner, the Genuino 101, which has a library, developed by Intel, allowing to perform the training of the model and the classification of new instances by using simple calls to the library API. There is no need for previous knowledge of machine learning algorithms, everything is already implemented into the library. Of course, this simplicity can be achieved only by removing more advanced features, that are offered by uTensor instead.
In conclusion we can state that the two technologies are both valid, and producing very good results, we can't decide an absolute winner since they are offering different capabilities. If you have no experience with machine learning and Tensor Flow the best way to go is to use the Genuino 101, but if you are looking for more control over the classification algorithm or you need to process a huge amount of data that would be impossible to put into the Genuino board for the training, since the memory is very limited, than you have to rely on uTensor.
Technologies comparison presentation