Large strides have been made in human–computer interaction (HCI) in recent years, with more natural interfaces being developed around voice and gesture control. But these technologies are still used sparingly, and typically for fairly simplistic interactions with machines. Keyboard and mouse are still king when real work needs to be done, leaving the HCI devices we really want in the realm of science fiction.
In the case of gesture control, there are two primary options for capturing these inputs. In the first case, an instrumented glove can be worn. While this provides highly accurate information about gestures, these systems tend to be expensive, and wearing an instrumented device can be unnatural and cumbersome. The other option involves using cameras to capture gestures from a distance. This method requires no on-body instrumentation, and so is much more comfortable, however, it also is less accurate and requires substantial computational resources.
A team of researchers from Sun Yat-Sen University set out to improve upon camera-based gesture recognition methods, to bring them one step closer to building the type of interface that could be used for complex interactions with computers. Their contribution to gesture recognition involves the use of a few optimizations that can both improve recognition accuracy and reduce computational complexity.
To deal with poor recognition rates, the team eschewed the one-size-fits-all strategy taken by most gesture recognition algorithms. Recognizing that people have differing sizes of hands, they take into account the user’s palm width, palm length, and finger length. The hand is categorized into one of three different types, after which gesture recognition can be conducted. The gesture recognition algorithm used will have been trained only on data samples for that specific type of hand. This approach improves recognition accuracy without any appreciable additional resource utilization.
Another innovation by the group was to build a prerecognition step into the pipeline. This feature uses a relatively simple algorithm to first select a subset of possible matching gestures, from the full set of all known gestures, by looking at the area of the hand. Since this metric is not sensitive to most transformations, it can deal with the rotation, translation, and scaling that trips up many other techniques. After pre-recognition, the final gesture can then be determined using a more complex algorithm, but by reducing the options, this algorithm can be both more accurate, and less computationally intensive.
The team gave their setup a whirl with a group of forty participants with different hand types. Using an algorithm trained on nine different gestures, they were able to achieve an average classification accuracy of 94%. The recognition rate still exceeded 93% even when images were rotated, translated, or scaled. These are fairly impressive results, and considering that the algorithm was designed with resource-constrained devices in mind, we may be seeing some of these techniques put to use in the devices we use in our daily lives in the future.