This project used Lattepanda Windows SBC to build a light-weight OCR recognizer, using webcam to capture characters and provide acoustic feedback. System will guide user by vibration to move their finger to appropriate location, then speak out the character after the recognition process is finished.
來認識一下視障者的相關資訊吧
一般日常生活中閱讀文字是不可或缺的事情,除了閱讀書籍外,包括閱讀藥物罐上的說明,操作家電(例如微波爐)上的按鈕,閱讀車站內看板上相關訊息(例如位置、樓層)時,都需要辨識這些文字訊息。對視障者來說,理解這些文字訊息是一件困難的任務。對「中途失明者」來說更是如此,所謂的「†」是指年齡為15歲以上之視障者,他們原本擁有正常的視力,後來因為疾病或意外傷害導致失明。這些視障者未曾接受過特殊教育,他們普遍呈現摸讀速度緩慢、連續性不足、缺乏效率等問題。(Ref [1]-[5])
為了協助視障者閱讀文字,近年來國內外許多研究者提出許多研究成果。主要可分為穿戴式裝備或是手持式裝置兩大類型的輔具。(Ref [6]-[13])
- wearable auxiliary:
將身體當為攝影機或相機的載具,並在攝影機擷取影像後,搜尋影像中可能存在的文字區塊加以辨識。學者Yi與Tain將小型攝影機架設在眼鏡上,並透過影像分析與文字辨識技術,辨識出書寫在物品上的文字。Hanif與Prevost 同樣將小型攝影機架設在眼鏡上的方式,來辨識招牌上的文字,並提供一個震動回饋來與視障者互動。Mattar等人[8]設計一個頭戴式攝影機,來辨識招牌上的文字。Ezaki等人將攝影機架設在肩膀上,來辨識招牌上的文字。
- hand-held auxiliary:
手持式裝置的特色就是視障者透過PDA或智慧型手機作為影像擷取裝置。學者Peters 等人[10]以PDA的相機擷取影像來辨識鈔票、條碼與商品上的標籤。Shen等人[11]使用智慧型手機辨識門牌或路標,並提供震動回饋來與視障者互動。
Technical problem:- Image quality is heavily effected by light source, focus and surface reflection.
- Hard to locate the text area in a complicated background.
- High performance required (may need several/dozens seconds of processing time).
- Hard to recognize which texts are which visually disabled really interested with.
由MIT多媒體實驗室Shilkrot等人在2014與2015年所提出的穿戴式手指閱讀器。協助視障者閱讀紙本書籍上的英文文字。當系統辨識書籍上的英文文字後,利用語音朗讀文字的方式,讓視障者得知書籍上的文字內容為何。
This project focus on recognizing Chinese characters. With the reader equipped on user's index finger, the reader will guide visually disabled to point to the appropriate position by vibration, then transfer the camera image into single Chinese characters and finally speak out.
Hardware components including:
- small camera
- vibrating motor
- capacitive touch switch
- microsystem to control motor
The microsystem can calculate the relative distance with finger and paper by camera image, and then extract text information with image.
Hardware
閱讀器裡面安置一個(5)電容式觸控開關,手指可藉由觸摸開關來切換閱讀器的中文閱讀模式、閱讀器內部也安置四個具震動功能之微型馬達,電腦端的控制系統可以即時控制馬達震動,並透過震動的提示資訊,引導食指移動到適當的閱讀位置。(6)LattePanda做我們文字辨識的後端處理系統,配合閱讀器可以隨時做文字處理,並隨時提供給使用者語音回饋。
閱讀器擁有中文單字閱讀模式, the system will only detect the character the finger pointed to.
Under this mode, motor is off while reading text and other vibrating hint function like 跳行偵測和引導換行 are off as well.
Somehow, visually disabled are not able to put their finger on the correct position of what they want to read. When reading, our reader will guide user's index finger to appropriate position by vibration, and executing Chinese OCR immediately.
There are four micromotor embedded in this reader. According to different scenario, we have three guide 機制
(A)引導使用者到距離手指最接近的文字段落,並引導手指移至該行文字的開頭位置(圖4(a))。
(B)當手指出現跳行或偏移到其他橫行文字的情況時, reader will vibrate to guide user's finger back to previous position (4b).
(C)When user's finger reached line end, reader will guide user back to line beginning to read next line (4c).
當手指閱讀器將指尖前方的影像資訊傳送至電腦端後,CILAB from Tamkang University had developed an algorithm to recognize Traditional Chinese characters in image. Please check the 圖是系統流程圖,這個系統利用影像前處理(灰階、Otsu、Opening、Closing)等演算法,找出指尖位置與傾斜角度,再偵測出行高等相關資訊來做文字擷取切割出單一的文字圖,reader then invoke Traditional Chinese OCR and text-to-speech functions to speak the result.
If we want to binarize the whole image, the result will highly depend on light source and other facts which may cause error recognition. Further more, this will also affect the accuracy of segmentation and Chinese OCR. Therefore we use local-based binarization method to better extract text information in image and improve the OCR accuracy.
For the horizontal rows, we use projection scanning method to detect different rows, like the gray areas in 6a.
When the reader finished detecting row position and calculating row height, it will segment the image with 2 times of row height (as segment height) and 4 times of row height (as segment width) based on the finger tip position (red rectangular are in 7a), then apply Otsu threshold to binarize the character, like 7b.
Then we segment Chinese characters by vertical projection method(8a). Segmenting Chinese character is more dificcult then English
characters. For some Chinese character like "化" and "川", they may be segmented into two or three characters while using vertical projection method.
The contour of many Chinese characters are very similar to rectangle. Therefore while segmenting, we can combine several narrow-width and nearby characters into one character according to the line height. 8b is a single segmented Chinese character.
Classifier of this project had used more than 100,000 printed Tradition Chinese fonts, including 標楷體, 細明體 and 繁黑體, and is able to recognize 5,00 different traditional traditional Chinese characters with 97% accuracy. It can also recognize character with certain level of loss or damage.
This project had used Microsoft Speech Platform to implement text-to-speech(TTS) in Tradition Chinese pronunciation.
References:[1] 柯明期, 中途失明者適應與重建之研究。師範大學特殊教育所碩士論文, 2004。
[2] 李佳玲, 中高齡視障者電腦使用動機及對圖書館電子化資源服務需求之研究。臺灣大學圖書資訊學研究所碩士論文, 2013。
[3] W. Jeong, “Emotions in information seeking of blind people,” in Diane Nahl and Dania Bilal (Eds.), Information and Emotion: The Emergent Affective Paradigm in Information Behavior Research and Theory pp. 267-278, 2007.
[4] 陳怡佩。視覺障礙兒童及青少年的資訊需求。臺灣圖書館管理季刊, 2(3), pp. 32-43, 2006.
[5] K. Carey, "The opportunities and challenges of the digital age: a blind user's perspective," Library Trends 55(4): 767-784, 2007.
[6] C. Yi, and Y. Tian, "Assistive text reading from complex background for blind persons," in Camera-Based Document Analysis and Recognition. Springer, 15–28, 2012.
[7] S. M. Hanif, and L. Prevost, "Texture based text detection in natural scene images-a help to blind and visually impaired persons," In CVHI, 2007.
[8] M. Mattar, A. Hanson, and E. Learned-Miller, "Sign classification using local and meta-features," in IEEE CVPR Workshops, pp. 26–26, 2005.
[9] N. Ezaki, M. Bulacu, and L. Schomaker, "Text detection from natural scene images: towards a system for visually impaired persons," in Proc. of ICPR, vol. 2, pp. 683–686, 2004.
[10] J.-P. Peters, C. Thillou, and S. Ferreira, "Embedded reading device for blind people: a user-centered design." in Proc. of IEEE ISIT, pp. 217–222, 2004.
[11] H. Shen, and J. M. Coughlan, "Towards a real-time system for finding and reading signs for visually impaired users," In Proc. of ICCHP, pp. 41–47, 2012.
[12] R. Shilkrot, J. Huber, C. Liu, P. Maes, and S. C. Nanayakkara, "Fingerreader: A wearable device to support text reading on the go," in CHI EA, ACM, pp. 2359–2364, 2014.
[13] R. Shilkrot, J. Huber, M. E. Wong, P. Maes, and S. C. Nanayakkara, "Fingerreader: A wearable device to explore printed text on the go," in ACM CHI 2015, pp. 2363–2372, 2015.
Comments