Smart devices are becoming more and more prevalent year-by-year. However, many of their operations are controlled through phones, or with smart speakers. We seek an easier, more intuitive way to coordinate them - gestures.
We aim to design a ring-based input-system for controlling a set of devices. These operations may include raising/lowering volumes, turning devices on and off, or even potentially controlling the movement of a mobile robot (if time permits). Each operation will be mapped to a set of hand gestures defined in our system, ranging from pinching two fingers to snapping them.
Sensing (input): IMU connecting to the fingers to detect and measure the movements of the index and middle finger.
Output (actuation): bluetooth HID commands such that wireless characters are sent to the host device.
One challenge would be adapting the software to connect to multiple devices and identify which device is currently “pointed at” by the ring. Another issue regarding the practicality of this device is the actual size and weights of the required components on the hand.
DEMO:Milestone 1 summary:At the milestone 1 stage of our project, we were very hesitant about what to work on and not sure which idea we should stick with. As a group, we joined together and came up with several ideas across different hardware and software paradigms to identify the most compelling project. Our discussion ultimately led to three primary, hardware-driven interaction systems, which are freehand input tracker, posture assist device, and AR Bluetooth remote control. Although we discussed a few other notable concepts like haptic communication bands and a voice recognition robot, we didn’t go with those ideas.
Our first idea is a Freehand Input Tracker that utilizes gesture-control rings. The purpose of designing this system is to support intuitive spatial interactions by tracking different movements. Specifically, we aim to map these different hand gestures to core digital commands, including selection, panning, zooming, and rotation. This approach explores how minimal wearable hardware can seamlessly replace traditional input methods for manipulating digital environments.
The second idea we developed is a physical Posture Assist Device designed to detect and combat slouching. This system relies on a sensor module that continuously monitors the user's seated position. The sensor data is fed into a microcontroller, which processes the information and triggers immediate physical alerts. Actions such as illuminating an LED and activating a vibration motor for whenever poor posture is detected.
Our final idea focuses on augmented reality by remotely controlling Bluetooth devices using Snapchat glasses. This concept creates a system where users can send signals with certain actions to smart devices like speakers and lighting, through interactions facilitated by the glasses. Physical hand gestures captured by the glasses will send signals to Bluetooth devices for actions.
Presenting these three distinct ideas in class gave us a clear understanding of what is possible to complete within this semester, and what could become long-term goals outside of class. Based on the feedback we received on Ideas 1 and 3, we were suggested to combine these two, where we focus on creating a new system that uses gesture-control rings to remotely control Bluetooth devices. It would map different hand gestures to core digital commands such as pause/continue, volume changes, prev/next, and etc.
Milestone 2 summary:Since the first milestone, we successfully made a functioning hardware and software prototype from the initial concept:
- We designed and 3D printed the first ring prototype to fit the IMU sensor.
- We established the project repository and implemented basic gesture recording using an existing gesture recognition system, detection (short swipes), and deletion using accelerometer and gyroscope data.
Future plan outline:
After we made a basic gesture recognition system working, we needed to actually use these gestures to control devices. We first needed to set up a Bluetooth proxy between the ESP32 (or perhaps an alternate, smaller device), a server (a laptop), and a Bluetooth-capable device. From there, we needed to assign gestures to various actions on the device. For a Bluetooth speaker, this might include:
- Swiping left to go back one song/restart a song
- Swiping right to skip to the next song
- Swiping up to raise the volume
- Swiping down to lower the volume
- We redesigned and 3D printed the ring to accommodate a breadboard.
- We attempted to train an LSTM model to detect gestures, but found that its accuracy was poor and we reverted to our previous gesture recognition system
- We set up our device for connecting to a laptop and mapped the following gestures to actions: Tap air - media pause, Rotate wrist clockwise - increase device volume, Rotate wrist counterclockwise - decrease device volume, Swipe right - media next, Swipe left - media previous, Raise hand and draw circle like Dr. Strange (custom action) - play YouTube video of Dr. Strange portal being opened
The device is an enclosure mounted on 2 rings meant to be worn on the index and middle finger. We increased the size to accommodate a breadboard, IMU, and ESP32 mini, and USB cable for power.
The device is able to connect to a laptop and control it via mapping gestures to media controls. We used this to successfully control media playback on Spotify and YouTube handsfree.
In our first demo, the person who recorded the gestures performed them and was able to perform them reliably. In our second demo, someone else performed the gestures and was not able to reliably perform the gestures. This is because our recording method does not facilitate integrating data from multiple users, which means the ring’s gesture recognition is specific to the person who recorded the gestures.
Another problem is that the breadboard inflates the size and weight of the enclosure prohibitively. It would have been preferable to solder the components to reduce the size and weight of the ring enclosure. We may have been able to include a battery as well without making the size too large, but it would introduce heat accumulation problems given the necessary proximity of the battery to the IMU and ESP.
We attempted to train an LSTM using 50 demonstrations for each action. Although the model converged, it performed poorly on the validation set. The most likely reason for this divergence was overfitting caused by limited training data. We recorded 50 examples for each of our 6 gestures, which likely was not enough to train a generalizable model. We also tried other data sources, such as the Hand Gesture Classification IMU Dataset from Kaggle. However, even slight differences in their recording setup, when compared to our own IMU ring placement, were likely to cause performance issues in our own environment. Furthermore, the limited number of samples even in this dataset (about 100 per gesture) caused similar overfitting issues even for the same data distribution. These limitations led us to sticking with our own recorded gesture readings, and to opt for a rigid threshold-based approach for gesture detection. Essentially, each of the 50 recordings (for each gesture) were averaged together. If a new reading measured at prediction time was close enough to this average (defined by a threshold constant), and if it had a closer match than the other gesture averages, then it was assigned to this gesture.
Related products/Inspirations:The Clapper - this device turns lights on and off by double clapping. This gesture is simple to perform, can be done anywhere in a room, and is well understood. It is also uncommon in the home and therefore facilitates a recognition system with a low false positive rate. Link: https://www.amazon.com/Clapper-Activated-Detection-Appliances-Technology/dp/B0000CGKLR?th=1
Meta Neural Band - This device uses EMG and Meta glasses to control smart devices handsfree with gestures. Gestures include finger taps and slides. The gestures mapped to controls seem unintuitive. Link: https://www.meta.com/emerging-tech/emg-wearable-technology/?srsltid=AfmBOoozZklj4F78S2okypi5QLxmyF1BGT5ibuHCArvlUp3utlrIDDEL
Both of these products serve as inspiration. The form factor of the latter is more useful for our purposes since we want our device to connect to various devices, rather than a fixed few like the first device.
How to use:3D print the ring using the provided STL or CAD file. Place a breadboard inside the enclosure as well as the IMU and ESP. Their orientation doesn't matter so long as a cable can be threaded through the enclosure ceiling and inserted into the ESP to provide power. For building and uploading the code for the actual gesture detection, instructions are fully provided in our repository.





Comments