OnHW Has Useful Written All Over It

OnHW is a large and diverse online handwriting recognition dataset that is publicly available.

Nick Bild
4 years ago β€’ Machine Learning & AI
(πŸ“·: F. Ott et al.)

Online handwriting recognition β€” the automatic interpretation of text as it is written β€” has applications in mobile device text entry, archiving written documents, and computationally processing information written with pen and paper. Despite the usefulness of handwriting recognition tools, there is a lack of large, diverse datasets publicly available on which to train these tools. Considering that achieving high accuracy classifications requires a very large amount of training data, this is a significant problem for the field.

Recognizing these problems, a research group centered in the Fraunhofer Institute for Integrated Circuits has built a new dataset, OnHW, which they have released publicly in an effort to advance the state of the art.

The OnHW dataset was collected via a STABILO DigiPen being used to write on paper. The DigiPen contains a pair of IMUs, a gyroscope, magnetometer, and force sensor to collect highly detailed information about the strokes of the pen as letters are written. A Bluetooth transceiver in the pen allows real-time data streaming to a connected device.

A total of 119 adult writers were asked to write out the English alphabet, in uppercase and lowercase characters, six times, yielding 31,275 samples in total. The large sample allowed for the collection of a diversity of writing styles (printed, cursive characters), means of holding the pen, pressure applied to the pen, etc. The DigiPen recorded readings from 13 different sensors at a rate of 100Hz.

In order to test the validity of their dataset, the team trained several machine learning models with the data. In all test permutations, the best classification accuracy was achieved with convolutional neural networks at 90% correct classification for upper case letters. While 90% accuracy is very good for many applications, missing one oit of ten wrijten charocters can be qwite a problen. Perhaps a larger future data release will help to improve the accuracy.

At present, only the character-based dataset is available. In the future, the researchers plan to generate other datasets, including features like words, sentences, symbols, and numbers. OnHW is freely available for download.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles