Hand Gesture Recognition That You'll Glove
WaveGlove uses five IMU sensors to recognize more complex hand gestures from a larger vocabulary.
Alternative means of interacting with computers, such as gesturing, are rising in popularity due to the low cost and high availability of sensors in conjunction with increases in available computational resources. Now that these technologies have become mainstream, focus is shifting to fine-tuning of methods to provide more natural and usable interfaces.
Consider the case of the WaveGlove, for instance. For low-power and mobile applications, hand gesture recognition is typically achieved with the help of a single inertial measurement unit (IMU) attached to the hand; this allows for basic gesturing in which the movement of the whole hand can be recognized. WaveGlove, in contrast, uses five IMUs, one on each finger, to collect higher-resolution data about the movements of the hand and fingers.
The prototype device, developed by a pair of researchers at Comenius University Bratislava in Slovakia, consists of a silken glove worn on the left hand fitted with five common MPU6050 IMUs, one near the tip of each finger. The IMUs all connect to a multiplexer located on the back of the palm. Another longer cable runs from the multiplexer to external processing units and a power supply.
The team used the WaveGlove to collect two datasets — one called WaveGlove-single that is composed of eight simple gestures involving the whole hand, and a second called WaveGlove-multi in which ten more complex gestures involving specific finger movements were made. The often-used gesture vocabulary of WaveGlove-single was selected primarily to serve as a reference point, while the vocabulary of WaveGlove-multi was chosen to highlight the advantages of a multi-sensor approach. Over 11,000 gestures were collected in total.
The researchers evaluated over a dozen different classification methods on their datasets, ranging from classical machine learning to deep learning approaches. They found a Transformer-based architecture which utilizes a self-attention mechanism to provide the most accurate results. Near-perfect classification accuracy was achieved on the WaveGlove datasets.
The team next wanted to understand what factors made the most impact on the final performance of the device. They found that, unsurprisingly, the placement location of the IMUs on fingers had a significant impact when finger movements were an important part of a gesture. They also looked at the impact of the number of sensors used. They discovered that the number of sensors did not impact evaluation of WaveGlove-single gestures. For WaveGlove-multi, however, increasing the number of IMUs improved performance, but only to a certain point. Adding more than three sensors only marginally improved the results.
This research shows that multiple sensors clearly have a place in providing more natural, richer gesture-based interfaces. It also shows that a one-size-fits-all approach may not be optimal — rather the number of sensors, and their placements, should be customized for the particular use cases of a device. Continuing this work may lead to improved communication between humans and machines in the future. The team’s next step is to fine-tune their Transformer-based model to increase the vocabulary of gestures.