This past trimester, I've been working through the Embedded
Systems unit at Deakin University, School of IT (_Unit SIT210_).
One of our final assignments asked us to create an embedded system
to solve a real world problem. One problem that I was interested in
was reducing the communication barrier between those that use sign
language and those that don't understand it. My motivation for
choosing this problem was largely because of how interesting the
domain is. The most popular solution seems to be sign-to-speech
gloves which convert detected hand signs into audible speech.
However the solutions available on the market either have low
availability, or have an expensive price. For my project, I
wanted to create a sign-to-speech glove that has a focus on low
costs and being DIY friendly.
HardwareDue to time and budget constraints, I was only able to create 1 glove,
however it still has most of the functionality that I was aiming for.
The solution I ended up creating involved a lot of wiring. To get
started, you'll need the following:
- Raspberry Pi 4 (_or a better model_)
- Lots of jumper wires
- 9x 2.2" flex sensors
- 9x 47k #sym.Omega resistors
- HC4067 16-Channel Multiplexer
- MCP3008 8 Channel 10 Bit ADC DIP16
- USB Speaker
- A glove
- An external battery + USB-C cable
For each flex sensor, place one across each knuckle of the glove
and another
across the proximal interphalangeal joint (PIP joint) of the index
finger, middle finger, ring finger, and pinky finger on the glove.
The reason
the thumb doesn't get an extra flex sensor is because the flex
sensors are too long to cover just it's PIP joint. For each of these
flex sensors, connect one end to a 3.3V and the other end into a
pull-up resistor circuit (_using 47k #sym.Omega resistors_). It does
not matter which end is connected to which.
You should have 9 wires from each flex sensor which will vary it's
voltage depending on how much the flex sensors are bent. However we
can't connect these up to the Raspberry Pi just yet since these are
_analog_ and the RPi only accepts _digital_. To solve this, we connect
each wire into a channel on the HC4065 multiplexer, which will then
connect to an analog-to-digital converter, which then connects to the
RPi. I used channels 7-15 on my multiplexer, connected to the
following flex sensors:
- CH7 -> Pinky MCP
- CH8 -> Pinky PIP
- CH9 -> Ring MCP
- CH10 -> Ring PIP
- CH11 -> Middle MCP
- CH12 -> Middle PIP
- CH13 -> Index MCP
- CH14 -> Index PIP
- CH15 -> Thumb
Then connect the output to CH7 on the Analog-to-Digital converter,
and ADS0, ADS1, ADS2, and ADS3 to the RPi pins 7, 11, 13, and 15
respectively.
On the MCP3008 ADC, connect CLK, D_OUT, D_IN and CS to the
respective RPi pins 23, 21, 19 and 24. Now we can use the SPI protocol
and the address pins to read a digital binary number representing each
flex sensor output.
Finally, the easiest connections to make is plugging in the USB
speaker into one of the USB slots on the RPi 4, as well as the
external battery powering the RPi through a long USB-C cable. Once
everything is connected up, I strongly recommend gluing everything
into place on the glove using a fabric glue apart from the battery,
which should instead go into the user's pocket while they're using
the glove.
SoftwareThe software I created for this is a Rust program and is available
here.
It includes an easy nix flake that will include all needed
dependencies for you. If you haven't installed nix, you can do so
using the installer developed by Determinate Systems.
Sensors
I've separated all the code pertaining to interacting with the
flex sensors into it's own sensors crate. Most importantly, this
exposes a Sensor struct that helps abstract away implementation
details such as the SPI protocol.
use rppal::gpio::{Gpio, OutputPin};
use rppal::spi::{Bus, Mode, SlaveSelect, Spi};
/// Abstraction over the connected flex sensors
pub struct Sensors {
spi: Spi,
ads0: OutputPin,
ads1: OutputPin,
ads2: OutputPin,
ads3: OutputPin,
}
impl Sensors {
/// Create a new Sensors struct which requires GPIO and SPI access
pub fn new() -> Result<Self, Box<dyn Error>> {
let gpio = Gpio::new()?;
let ads0 = gpio.get(ADS0_PIN)?.into_output();
let ads1 = gpio.get(ADS1_PIN)?.into_output();
let ads2 = gpio.get(ADS2_PIN)?.into_output();
let ads3 = gpio.get(ADS3_PIN)?.into_output();
let spi = Spi::new(Bus::Spi0, SlaveSelect::Ss0, 1_000_000, Mode::Mode0)?;
Ok(Self {
spi,
ads0,
ads1,
ads2,
ads3,
})
}
}When the sensor struct is created, it sets up the SPI protocol and
and access to the GPIO pins.
Now we can create a method that returns the output of each flex
sensor
impl Sensors {
/// Read the output value of the connected flex sensors
pub fn read(&mut self) -> Result<[u16; 9], Box<dyn Error>> {
let mut all_readings: [u16; 9] = [0; 9];
for i in 0..9 {
// We want to read multiplexer channels C7 to C15
let channel = i + 7;
if channel & 0b0001 != 0 {
self.ads0.set_high();
} else {
self.ads0.set_low();
}
if channel & 0b0010 != 0 {
self.ads1.set_high();
} else {
self.ads1.set_low();
}
if channel & 0b0100 != 0 {
self.ads2.set_high();
} else {
self.ads2.set_low();
}
if channel & 0b1000 != 0 {
self.ads3.set_high();
} else {
self.ads3.set_low();
}
// Delay to give the multiplexer time to switch channels
thread::sleep(Duration::from_micros(10));
// Read from the mutliplexer via SPI
// start bit, mode bit, channel bits (3 bits), padding, dummy byte (keeps clock pulsing)
let tx_buffer: [u8; 3] = [0x01, 0xF0, 0x00];
let mut rx_buffer: [u8; 3] = [0; 3];
self.spi.transfer(&mut rx_buffer, &tx_buffer)?;
// Result is 10 bits across the last 2 bytes
let reading: u16 = (((rx_buffer[1] as u16) & 0x03) << 8) | (rx_buffer[2] as u16);
all_readings[i] = reading;
}
Ok(all_readings)
}
}Hand Signs
The next crate I made contained a representation of a hand's current
position.
/// The status of a phalanx (finger segment)
#[derive(Clone, Copy, Debug)]
pub enum PhalanxStatus {
/// Represents a straight phalanx
Neutral,
/// Represents a bent phalanx
Bent,
}
/// Representation of a hand position
#[derive(Clone, Copy, Debug)]
pub struct HandSign {
pub pinky_mcp: PhalanxStatus,
pub pinky_pip: PhalanxStatus,
pub ring_mcp: PhalanxStatus,
pub ring_pip: PhalanxStatus,
pub middle_mcp: PhalanxStatus,
pub middle_pip: PhalanxStatus,
pub index_mcp: PhalanxStatus,
pub index_pip: PhalanxStatus,
pub thumb: PhalanxStatus,
}The `HandSign` represents a sign by if the MCP or PIP joint of each
finger is curved or bent. This is a fairly binary representation
because flex sensor readings are extremely noisy and prone to
'drifting' which makes them difficult to get accurate readings.
Hand
This crate is the largest in the application, and converts the raw
flex sensor readings into detected hand signs.
/// The state of a users hand and their current sign
pub struct Hand {
sensors: Sensors,
sensor_history: [[u16; SENSOR_HISTORY_LENGTH]; 9],
time_since_update: Instant,
time_since_prev_update: Instant,
pub current_sign: HandSign,
pub bendtime_info: Option<([f32; 9], Instant)>,
word_to_sign: HashMap<String, HandSign>,
}When a Hand struct is newly created, there is some necessary setup we
need to do first. Because the flex sensors are fairly noisy, we
need to keep a history of each raw flex sensor value so we can take an
average and have a smoother reading. We also need to keep track of
some time keeping values so we can calculate the acceleration of each
phalanx. The Hand::new() function takes care of initializing
these values.
impl Hand {
/// Creates a new Hand struct
pub fn new(word_to_sign: HashMap<String, HandSign>) -> Self {
let mut sensors = Sensors::new().expect("should be able to initialize sensors");
let time_since_prev_update = Instant::now();
let mut sensor_history = [[0_u16; SENSOR_HISTORY_LENGTH]; 9];
for history_index in 0..SENSOR_HISTORY_LENGTH {
let current_sensor_output =
sensors.read().expect("should be able to read from sensors");
for phalanx_index in 0..9 {
sensor_history[phalanx_index][history_index] = current_sensor_output[phalanx_index];
}
}
let time_since_update = Instant::now();
Hand {
sensors,
sensor_history,
time_since_update,
time_since_prev_update,
current_sign: HandSign::neutral_position(),
bendtime_info: None,
word_to_sign,
}
}
}The below 2 methods, get_finger_positions() and
get_finger_accelerations() is what the main logic for calculating
the current hand sign is based on. Both of these methods do not
mutate the Hand struct, they just retrieve from it. The values they
output will likely change after their state is updated through the
update() method, which will be discussed later.
impl Hand {
/// Return the current position of each finger segment, represented by an array of f32
/// 0.0 means neutral, 1.0 means curled
fn get_finger_positions(&self) -> [f32; 9] {
let average_finger_positions: [f32; 9] = self.sensor_history.map(|phalanx_history| {
let phalanx_history = phalanx_history.iter().map(|x| *x as f32);
let sum: f32 = phalanx_history.take(SENSOR_AVERAGE_WINDOW).sum();
let mean = sum / (SENSOR_AVERAGE_WINDOW as f32);
mean
});
average_finger_positions
}
/// Return the current acceleration of each finger segment as an array of f32
/// 0.0 means neutral, 1.0 means curled
pub fn get_finger_accelerations(&self) -> [f32; 9] {
let average_finger_positions = self.get_finger_positions();
let prev_average_finger_positions: [f32; 9] = self.sensor_history.map(|phalanx_history| {
let phalanx_history = phalanx_history.iter().map(|x| *x as f32);
let sum: f32 = phalanx_history.skip(1).take(SENSOR_AVERAGE_WINDOW).sum();
let mean = sum / (SENSOR_AVERAGE_WINDOW as f32);
mean
});
let change_in_time = (self.time_since_update - self.time_since_prev_update).as_secs_f32();
let finger_accelerations: [f32; 9] = average_finger_positions
.into_iter()
.zip(prev_average_finger_positions)
.map(|(final_pos, initial_pos)| (final_pos - initial_pos) / change_in_time)
.collect::<Vec<f32>>()
.try_into()
.expect("num of items should not have changed");
finger_accelerations
}
}The `update()` method is fairly complicated, but starts off with
updating the current flex sensor history and the recorded time since
the `update()` method was last called
impl Hand {
/// Processes the current flex sensors inputs and returns a word from
/// the given handsign dictionary, if a handsign has been created
pub fn update(&mut self) -> Option<String> {
self.time_since_prev_update = self.time_since_update;
let current_sensor_output = self
.sensors
.read()
.expect("should be able to read from sensors");
let sensor_history = &mut self.sensor_history;
for phalanx_index in 0..9 {
sensor_history[phalanx_index].rotate_right(1);
sensor_history[phalanx_index][0] = current_sensor_output[phalanx_index];
}
self.time_since_update = Instant::now();
...
}
}Then, we calculate each finger acceleration. If a finger segment
is travelling
10 units/second in the clenching direction,
then that phalanx is considered
'Bent'. If it is travelling 10 units/second in the neutral position,
then that phalanx is considered 'Neutral'. If it is neither, then the
previously calculated phalanx state is used.
Additionally, if any of the fingers are considered bent, then keep
track of how long each finger has been bent for.
impl Hand {
pub fn update(&mut self) -> Option<String> {
...
let finger_accelerations = self.get_finger_accelerations();
let finger_changes: [Option<PhalanxStatus>; 9] = finger_accelerations.map(|x|
{
if x > 5.0 {
Some(PhalanxStatus::Neutral)
} else if x < -5.0 {
Some(PhalanxStatus::Bent)
} else {
None
}
});
if self.current_sign.as_bendtime().iter().any(|x| *x > 0.0) {
let deltatime = (self.time_since_update - self.time_since_prev_update).as_secs_f32();
let (mut bendtime, bendtime_start) = match self.bendtime_info {
Some((bendtime, bendtime_start)) => (bendtime, bendtime_start),
None => ([0.0; 9], self.time_since_update),
};
let current_sign = self.current_sign.as_bendtime();
bendtime
.iter_mut()
.enumerate()
.filter(|&(i, _)| current_sign[i] == 1.0)
.for_each(|(_, x)| *x += deltatime);
self.bendtime_info = Some((bendtime, bendtime_start));
}
...
}
}Now we have all the information we need to predict if the user is
currently making a handsign. If the user has _just_ put their hand
back into a neutral state, then we look at how long each finger
segment was bent for and we compare it to the `word_to_sign`
dictionary that the `Hand` struct was built with. If there are any
signs that have more than 90% similiarity, then the one with the
strongest similarity is returned from this function. If none pass
this criteria, then nothing is returned.
impl Hand {
pub fn update(&mut self) -> Option<String> {
...
let mut detected_word: Option<String> = None;
if self.current_sign.as_bendtime().iter().all(|x| *x == 0.0) {
if let Some((bendtime, bendtime_start)) = self.bendtime_info {
let bendtime_duration = (self.time_since_update - bendtime_start).as_secs_f32();
let adjusted_bendtime: [f32; 9] = bendtime
.iter()
.map(|x| x / bendtime_duration)
.collect::<Vec<f32>>()
.try_into()
.expect("should have the same length iterator");
let word_similarities: HashMap<String, f32> = self
.word_to_sign
.iter()
.map(|(word, sign)| {
(word.clone(), sign.similarity_to_bendtime(adjusted_bendtime))
})
.collect();
detected_word = word_similarities
.into_iter()
.filter(|&(_, similarity)| similarity > SIMILARITY_THRESHOLD)
.max_by(|&(_, a), &(_, b)| a.total_cmp(&b))
.map(|(word, _)| word);
self.bendtime_info = None;
println!("action finished!");
}
}
self.apply_finger_changes(&finger_changes);
detected_word
}
}App
The `app` crate is the entrypoint to the program, and makes sure
that the detected word from the `Hand` struct will be spoken over
text to speech. For text to speech, we are using the `piper_rs` crate
for the text to speech model, alongside the `rodio` crate for
managing audio playback
First, we initialize the TTS Piper model, which should be found in
the `model/` directory of the project.
// Create a separate thread to process text-to-speech,
// and use the main thread to detect and process hand movements
fn main() -> Result<(), Box<dyn Error>> {
println!("Starting sensor readings (Press Ctrl+C to quit)...");
let config_path = Path::new("./model/voice.onnx.json");
let onnx_path = Path::new("./model/voice.onnx");
println!("Initializing Piper TTS...");
let mut piper = Piper::new(onnx_path, config_path).expect("Failed to initialize Piper model)");
println!("Model loaded successfully!");
...
}After the TTS model has successfully been loaded, we create a
sender and receiver channel, and give the receiver channel to a
new thread. This channel will be used to send words to this thread,
which will be responsible for outputting text to speech audio. We
want to run this on a separate thread so it doesn't block the
processing of the flex sensors. If a deaf user was using this glove,
it would be frustrating if they started a new sign but it wasn't
processed because the glove is currently speaking, which the user
wouldn't be able to know.
fn main() -> Result<(), Box<dyn Error>> {
...
let (tx, rx) = mpsc::channel::<String>();
// Text-to-speech thread
thread::spawn(move || {
// Get the default device audio output stream
let (_stream, handle) = match OutputStream::try_default() {
Ok(v) => {
println!("tts: audio device opened");
v
}
Err(e) => {
eprintln!("tts: failed to open audio device: {}", e);
return;
}
};
// Create a handle to the default audio output
let sink = match Sink::try_new(&handle) {
Ok(s) => {
println!("tts: sink created");
s
}
Err(e) => {
eprintln!("tts: failed to create sink: {}", e);
return;
}
};
// Constantly process incoming words
for word in rx {
println!("tts: received word: '{}'", word);
match piper.create(&word, false, None, None, None, None) {
Ok((samples, sample_rate)) => {
println!(
"tts: synthesized {} samples at {}Hz",
samples.len(),
sample_rate
);
let buffer = SamplesBuffer::new(1, sample_rate, samples);
sink.append(buffer);
sink.sleep_until_end();
println!("tts: playback finished");
}
Err(e) => eprintln!("tts: synthesis error: {:?}", e),
}
}
});
...
}The rest of the main function is straight forward. We construct a
very basic word-to-sign dictionary with 3 phrases, and then use the
main thread to constantly process the the flex sensors. Whenever a
word is detected and returned from the `Hand` struct, it is sent
to the TTS thread so that it can be spoken.
fn main() -> Result<(), Box<dyn Error>> {
...
let mut word_to_sign: HashMap<String, HandSign> = HashMap::new();
word_to_sign.insert("hello how are you".to_string(), PEACE_SIGN);
word_to_sign.insert("where is the bathroom".to_string(), ROCK_SIGN);
word_to_sign.insert("i need assistance".to_string(), MIDDLE_BENT);
let mut hand = Hand::new(word_to_sign);
loop {
thread::sleep(Duration::from_millis(40));
if let Some(word) = hand.update() {
println!("main: detected word: '{}'", word);
if let Err(e) = tx.send(word) {
eprintln!("main: channel send failed: {}", e);
}
}
}
}ReflectionThis project started with the desire to make 2 high quality gloves
that seamlessly interact with eachother to understand what signs
you're making and translate it to audible speech. However in
retrospect, it was a very ambitious goal given my available time and
budget. I'm still very happy with what I made however, and it's
thankfully
revealed that my flex sensor approach is not the approach that
a fully functional pair of sign-to-speech gloves should go in if they
want to be accurate.
I've also had more time to think about the problem that these gloves
were targeted at solving. What I originally thought would be the
best outcome for this project would be a way to create high-quality,
cheap sign-to-speech gloves that are widely accessible. However this
would actually cause various sign languages to be learnt _less_, since
the gloves can be used as a crutch. I believe a better purpose for
these gloves would be for people who aren't familiar with sign
language to use them as a self-teaching tool, allowing them to
catch any slip-ups in their sign language while they are trying to
communicate with others in that language.









_3u05Tpwasz.png?auto=compress%2Cformat&w=40&h=40&fit=fillmax&bg=fff&dpr=2)


Comments