A few months ago, some researchers published a paper called SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices. It's possible to run it on a regular Arduino Uno - more ever, to train the machine learning model on the device itself. Simply put, SEFR (which got its name from a related algorithm called semi-supervised ensemble learning guided feature ranking method) calculates a hyperplane between different classes of data with their average values.
Needless to say, as one who sucks at math, I have became very, very interested in this algorithm.
The problem was that SEFR (the original version) is a binary classifier. So, by using the one vs. rest strategy suggested by the paper, I successfully create a multiclass classifier version.
(My Arduino C++ version has a built-in IRIS dataset and can indeed run on an Arduino Uno, with the training time less than 70 milliseconds. A Golang/TinyGo, MicroPython (for ESPs), CircuitPython (with ulab) version are also available; all of them have the training time less than 100-300 ms on different hardwares.)
Why SEFR?According to the authors of the paper, the goal is to utilize the processing power of the microcontrollers to train the model, instead of using the (precious) resource of computers or central servers. It can also reduce latency and data leak risk. And, since it's so simple, you can implement it quickly on different platforms thus save a lot of time and money.
It's still a good question that should you put the whole training data on the device itself, and how do you collect more data along with their labels (SEFR is a supervised ML after all). On the other hand, you can simply train the model somewhere, and copy the weights/bias data to the devices.
For the sake of demonstration, I'll put the training data on the Nano and do training on it.
From theory to practice - at least, an experimentNow, I started to think about how to employ this ML model to do something. There was an example of using KNN to do color recognition on Nano 33 BLE Sense; I decided to make a similar little project by using cheapest stuff possible. Which means, I want to use a real 2$ Arduino Nano (ATmega328P) clone instead of the 30$ fancy one.
The setup is pretty simple and cheap:
- An Arduino Nano 3.0 clone
- A TCS3200 color detection module (not a really precise sensor, but that's the point.)
- A TM1637 7-segment LED display
If you know where to buy them, these can cost as low as 7-8$.
I wrote two scripts, the first one is the sampler (data collector):
#include <tcs3200.h> // https://github.com/Panjkrc/TCS3200_library
#include <TM1637Display.h> // https://github.com/avishorp/TM1637
#define S0 2
#define S1 3
#define S2 4
#define S3 5
#define OUT 6
#define CLK 7
#define DIO 8
#define BTN 9
const int SAMPLE_SIZE = 30; // 30 samples for each label
const int SAMPLE_DELAY = 250; // sample delay
const int SAMPLE_LABEL_SIZE = 5; // 0-4
int sample_label = 0;
int counter = 0;
tcs3200 tcs(S0, S1, S2, S3, OUT);
TM1637Display display(CLK, DIO);
void setup() {
Serial.begin(9600);
display.setBrightness(7);
display.showNumberDec(0, false);
pinMode(BTN, INPUT_PULLUP);
}
void loop() {
if (sample_label == SAMPLE_LABEL_SIZE) {
Serial.println("");
for (int i = 0; i < SAMPLE_LABEL_SIZE; i++) {
for (int j = 0; j < SAMPLE_SIZE; j++) {
Serial.print(i);
Serial.print(", ");
}
}
display.showNumberDec(9999, true);
while (1);
}
display.showNumberDec(sample_label, true);
while (digitalRead(BTN));
counter = 0;
while (counter < SAMPLE_SIZE) {
int r = tcs.colorRead('r');
int g = tcs.colorRead('g');
int b = tcs.colorRead('b');
int w = tcs.colorRead('c');
Serial.print("{");
Serial.print(r);
Serial.print(", ");
Serial.print(g);
Serial.print(", ");
Serial.print(b);
Serial.print(", ");
Serial.print(w);
Serial.print("}, ");
display.showNumberDec(++counter, false);
delay(SAMPLE_DELAY);
}
sample_label++;
}
For this script, every time you pressed the button, the device would sample from the TCS3200 every 1/4 seconds and print them to the serial monitor window. After reading 30 samples, it would stop and wait for the sampling of next label (so you can hold up next item and get ready).
After data is collected, the target (label) array would be print out as well.
I choose the label as follows:
- 0 - nothing
- 1 - red button
- 2 - yellow button
- 3 - green button
- 4 - blue button
TCS3200 can read red, green, blue and white frequency of the object (although not as stable as the APDS9960 on Nano 33 BLE Sense). So there would be 5 labels, 150 data instances total, 4 features for each instance.
The classifierNext, this script contains the SEFR algorithm, as well as the complete dataset and labels (copied directly from the serial output of the sampler script). The hardware is the same except button is not used.
#include <tcs3200.h>
#include <TM1637Display.h>
#define S0 2
#define S1 3
#define S2 4
#define S3 5
#define OUT 6
#define CLK 7
#define DIO 8
const unsigned int DATASIZE = 150; // dataset size
const byte FEATURES = 4; // number of features
const byte LABELS = 5; // number of labels
// the dataset
const int DATASET[DATASIZE][FEATURES] = {
{1, 1, 2, 4}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {2, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {2, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {1, 1, 2, 5}, {4, 2, 3, 11}, {5, 2, 3, 11}, {5, 3, 3, 10}, {5, 3, 3, 12}, {5, 2, 3, 11}, {5, 2, 3, 11}, {5, 2, 3, 11}, {5, 3, 3, 11}, {5, 3, 4, 11}, {5, 2, 3, 11}, {5, 2, 3, 10}, {5, 2, 3, 11}, {5, 2, 3, 12}, {6, 3, 3, 12}, {6, 3, 4, 12}, {5, 2, 3, 12}, {5, 3, 3, 12}, {5, 2, 3, 12}, {4, 2, 3, 11}, {5, 3, 3, 11}, {5, 3, 3, 12}, {6, 2, 3, 12}, {6, 3, 4, 12}, {5, 2, 3, 12}, {5, 3, 3, 11}, {5, 2, 3, 11}, {6, 2, 3, 11}, {5, 2, 3, 11}, {5, 2, 3, 10}, {5, 2, 3, 12}, {6, 5, 4, 16}, {7, 5, 5, 17}, {8, 6, 5, 20}, {9, 7, 5, 22}, {8, 6, 5, 20}, {9, 6, 5, 20}, {9, 7, 5, 21}, {9, 7, 5, 25}, {9, 6, 5, 22}, {8, 6, 5, 20}, {8, 6, 5, 19}, {8, 5, 5, 18}, {8, 5, 4, 17}, {7, 5, 4, 15}, {7, 5, 5, 18}, {9, 7, 5, 20}, {9, 7, 5, 20}, {10, 8, 5, 23}, {10, 8, 5, 23}, {9, 7, 5, 20}, {10, 7, 5, 23}, {9, 6, 5, 20}, {8, 6, 4, 20}, {7, 6, 5, 19}, {9, 6, 5, 20}, {10, 7, 5, 22}, {10, 7, 5, 23}, {10, 8, 6, 24}, {9, 7, 5, 22}, {9, 7, 5, 20}, {2, 3, 3, 9}, {2, 2, 3, 9}, {2, 3, 3, 9}, {2, 3, 3, 9}, {2, 2, 3, 8}, {2, 3, 3, 9}, {3, 3, 4, 11}, {3, 4, 4, 11}, {3, 4, 3, 12}, {3, 4, 4, 12}, {3, 4, 4, 12}, {3, 3, 4, 12}, {3, 3, 3, 11}, {3, 3, 3, 10}, {3, 4, 4, 12}, {3, 4, 4, 12}, {3, 4, 4, 12}, {3, 4, 4, 12}, {3, 3, 4, 11}, {3, 3, 4, 11}, {3, 3, 3, 10}, {3, 3, 3, 11}, {3, 4, 3, 10}, {3, 4, 4, 12}, {3, 4, 3, 11}, {3, 3, 3, 10}, {2, 3, 3, 10}, {3, 3, 4, 10}, {3, 3, 3, 11}, {3, 4, 4, 11}, {3, 3, 5, 12}, {3, 3, 5, 12}, {3, 4, 6, 14}, {3, 4, 7, 14}, {3, 4, 8, 15}, {3, 4, 7, 15}, {3, 3, 6, 12}, {3, 3, 6, 12}, {3, 3, 4, 12}, {3, 3, 6, 12}, {3, 4, 8, 15}, {3, 4, 8, 16}, {3, 5, 9, 18}, {3, 4, 8, 19}, {3, 4, 7, 14}, {3, 3, 6, 13}, {3, 3, 6, 13}, {3, 3, 6, 12}, {3, 4, 6, 15}, {3, 4, 7, 15}, {3, 4, 8, 15}, {3, 4, 8, 16}, {3, 4, 8, 16}, {3, 4, 7, 14}, {3, 3, 6, 13}, {3, 3, 5, 12}, {3, 3, 6, 13}, {3, 4, 7, 14}, {3, 4, 8, 15}, {3, 4, 8, 16}
};
// labels of the dataset
const byte TARGET[DATASIZE] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
};
int weights[LABELS][FEATURES]; // model weights on each labels
int bias[LABELS]; // model bias on each labels
tcs3200 tcs(S0, S1, S2, S3, OUT);
TM1637Display display(CLK, DIO);
void setup() {
Serial.begin(9600);
display.setBrightness(7);
fit();
}
void loop() {
int r = tcs.colorRead('r');
int g = tcs.colorRead('g');
int b = tcs.colorRead('b');
int w = tcs.colorRead('c');
int new_data[FEATURES] = {r, g, b, w};
int prediction = predict(new_data); // detect the label
if (prediction > 0) {
Serial.print("Detected label: ");
Serial.println(prediction);
}
display.showNumberDec(prediction, false);
delay(100);
}
// train SEFR model
void fit() {
for (byte l = 0; l < LABELS; l++) {
unsigned int count_pos = 0, count_neg = 0;
for (byte f = 0; f < FEATURES; f++) {
float avg_pos = 0.0, avg_neg = 0.0;
count_pos = 0;
count_neg = 0;
for (unsigned int s = 0; s < DATASIZE; s++) {
if (TARGET[s] != l) {
avg_pos += float(DATASET[s][f]);
count_pos++;
} else {
avg_neg += float(DATASET[s][f]);
count_neg++;
}
}
avg_pos /= float(count_pos);
avg_neg /= float(count_neg);
weights[l][f] = int((avg_pos - avg_neg) / (avg_pos + avg_neg) * 100);
}
float avg_pos_w = 0.0, avg_neg_w = 0.0;
for (unsigned int s = 0; s < DATASIZE; s++) {
float weighted_score = 0.0;
for (byte f = 0; f < FEATURES; f++) {
weighted_score += (float(DATASET[s][f]) * float(weights[l][f]) / 100);
}
if (TARGET[s] != l) {
avg_pos_w += weighted_score;
} else {
avg_neg_w += weighted_score;
}
}
avg_pos_w /= float(count_pos);
avg_neg_w /= float(count_neg);
bias[l] = int(-100 * (float(count_neg) * avg_pos_w + float(count_pos) * avg_neg_w) / float(count_pos + count_neg));
}
}
// predict label of a new data instance
byte predict(int new_data[FEATURES]) {
float score[LABELS];
for (byte l = 0; l < LABELS; l++) {
score[l] = 0.0;
for (byte f = 0; f < FEATURES; f++) {
score[l] += (float(new_data[f]) * (float(weights[l][f]) / 1000));
}
score[l] += (float(bias[l]) / 1000);
}
float min_score = score[0];
byte min_label = 0;
for (byte l = 1; l < LABELS; l++) {
if (score[l] < min_score) {
min_score = score[l];
min_label = l;
}
}
return min_label;
}
This script is just barely able to run on the Arduino Nano. With more data on it the Nano will stop to work. I had to quantized weights and bias as int types to reduce the memory. Floating numbers' precision are sacrificed a bit, but it still works well.
Sketch uses 6,766 bytes (22%) of program storage space. Maximum is 30,720 bytes.
Global variables use 1,630 bytes (79%) of dynamic memory, leaving 418 bytes for local variables. Maximum is 2,048 bytes.
Low memory available, stability problems may occur.
See SEFR in action in the video below:
So how good was it?It appears the setup can detect the color objects pretty well, as long as you hold them at the roughly right distance and angle in front of the sensor.
To be sure, I ported the data to the computer to run some test with my Python version of SEFR: (test dataset is 20% of the whole data)
Training dataset cross-validation accuracy: 0.95
Test dataset prediction accuracy: 0.967
Test dataset classification report:
precision recall f1-score support
0 1.00 1.00 1.00 7
1 1.00 1.00 1.00 5
2 1.00 1.00 1.00 4
3 0.89 1.00 0.94 8
4 1.00 0.83 0.91 6
accuracy 0.97 30
macro avg 0.98 0.97 0.97 30
weighted avg 0.97 0.97 0.97 30
Looks like we have about 95+% accuracy, pretty good!
Final thoughtsSo, that's it - unlike TensorFlow Lite, I successfully train and use SEFR solely on an 16MHz Arduino Nano in no time. This experiment proved that It is indeed possible to employ SEFR as multi-label classification tool on microcontrollers.
Actually, using Arduino Uno/Nano may not be that practical - the 2K memory of AVR microcontrollers is clearly almost too small for a small dataset and a couple of drivers. Fortunately, there are a lot of cheap 32-bit boards with much bigger RAM on the market now, some of them are cheaper than the Nano 33 BLE sense. Or, like I said before, you can simply input the weights/bias array and only use the predict function.
My next goal is to see if SEFR can be used for far more complex dataset, like small image or even processed sounds.
Comments