So You Want to Train an ML Classifier Directly on an Arduino Board?

As of now, we know it's possible to run ML inference on tiny microcontrollers. What if you could train a classifier on-board as well?

As of now, we know it is possible to run machine learning inference on tiny microcontrollers thanks to TensorFlow for Micro and my very own library MicroML. What if you could train a classifier directly on the microcontroller, too?

When I first started this journey in the world of machine learning on microcontrollers, one fact was set in stone for me: you train your classifier once and for all on a PC, then deploy it to your MCU.

As simple as that.

Training is a heavy process, requires lots of computations, and memory. You just want a machine as powerful as possible to carry out this task as fast as possible. Moreover, it is a one-time task: once your classifier has been trained, it doesn't need to be updated again.

And this yield true until now — when Joao Carvalho, in the comments on the post about an alternative to SVM which produces much smaller models that I strongly invite you to read, challenged me with this idea of running the SVM training directly on the microcontroller.

In the past I replied "no way" to people asking about this topic on forums, but Carvalho was so kind to link me a JavaScript implementation of the simplified SVM SMO (Sequential Minimal Optimization) algorithm.

At a first glance it looked quite easy to port from JavaScript to C, so I gave it a try.

And, in fact, it only took me 30 minutes to get it working on my PC.

Then I deployed it to my ESP32 and... it worked!

My first try was with 10 samples from the the IRIS dataset: it took barely any time to train.

But I know SVM training and inferencing time grows rapidly with the number of training samples.

Execution time will be the most limiting factor for this kind of task, so I created a benchmarking setup to evaluate the performance of the algorithm on different dataset sizes and features dimensions. The results are summarized in the following table and plots.

*All benchmark are obtained on an ESP32 board
** The inference took actually sometimes less than 1ms to run for all the test samples, so it was rounded to 1ms and divided by the number of samples.

Onboard IRIS dataset training time. Features dim = 4

Oboard Breast cancer dataset training time. Features deim = 30

We can see from the table that for the iris dataset, which has four-dimensional features, the training process is quite fast: 80ms to train on 60 samples.

Things become much different when training on the breast cancer dataset, with its 30 features per sample. Now we're talking abouts seconds to train and even minutes when increasing the number of samples to 60.

Fortunately, the inference time stays almost flat, so you will have real-time predictions.

I ran the iris benchmark on a Seeed Studio Xiao M0 board (32-bit 48MHz processor), too. It took seven seconds to train on 60 samples vs. 80ms on the ESP32. It is clear some boards are better than others for this task

Downsides

Of course, there are downsides as it's not a perfect world.

Convergence

As the paper reported, "There is one thing to note, the algorithm (the simplified version) is not guaranteed to converge".

I actually don't know, in practice, what this means. But it sounds like a bad thing.

Binary Classification Only

As of now, this algorithm can only do binary classification.

I hope to implement multi-class classification in the future with the one vs. all approach, but I don't really know if it would be too inefficient for an MCU to run.

Declining Accuracy

If you look at the benchmark table above, you'll also notice the accuracy does not always increase linearly with the training samples size. It seems it reaches an optimum and then starts decreasing.

If you're going to deploy your device in an autonomous scenario, you'll need to monitor your accucary every time you re-train it, or your results are going to b poor.

You should keep track of the optimum you achieved and roll back to its training set when you register a declining accuracy.

Memory

You will need to keep all your training set in memory for the classifier to both learn and predict. This means RAM will be a limiting factor and we know RAM is an expensive resource on microcontrollers.

You will have to find a good enough compromise between the number of features, the number of samples and the accuracy.

Time to Go Hands-On

I created a sample project for you to train a color classifier with a super simple setup: you will only need a TCS3200 (color sensor) to follow along.

Don't have a TCS3200? No problem, you can train a classifier on the iris dataset

Check out the project repo on GitHub.

artificial intelligence

Eloquent Arduino