Step 0: Start WiFi on Odinub
Step 1: Turn on the onboard WiFi of Odinub
Step 2: Install and test Tesseract
Step 3: Validate that Tesseract has been installed properly
Step 4: Test the Tesseract
Step 5: The code to convert the text inside the image into string

Published June 22, 2019

Optical Character Recognition (OCR) Using Odinub

Have you ever imagined extracting texts out of the images? Let's do it using OCR.

IntermediateFull instructions provided764

Optical Character Recognition (OCR) Using Odinub

Things used in this project

Hardware components

Odinub

Software apps and online services

Tesseract

Story

This blog will help you in installing and using Tesseract library using optical character recognition(OCR).

OCR is the automatic process of converting typed, handwritten, or printed text to machine-encoded text that we can access and manipulate via a string variable.OCR is the technology used to differentiate printed or handwritten characters written inside digital images of physical documents such as scanned paper document. Its basic process includes recognition of the text in the documents and translating the characters to the code which can be used for data processing.

Tesseract, originally developed by Hewlett Packard in the 1980s, was open-sourced in 2005. Later, in 2006, Google adopted the project and has been a sponsor ever since.

The Tesseract software works with many natural languages from English (initially) to Punjabi to Yiddish. Since the updates in 2015, it now supports over 100 written languages and has code in place so that it can easily be trained on other languages as well.

Originally a C program, it was ported to C++ in 1998. The software is headless and can be executed via the command line. It does not come with a GUI but there are several other software packages that wrap around Tesseract to provide a GUI interface.

For OCR, follow these steps:

Step 0: Start WiFi on Odinub

Fire this below command in your Odinub's terminal.

cd ..

This is how you can go back to root directory.

sudo nano /etc/network/interfaces

Now you will get below text in your nano editor

Auto wlan0Iface wlan0 inet dhcp                 Wpa-ssid Odinub(your wifi name)                 Wpa-psk 1234(your password)

Change your SSID and Password. Once you have updated your wifi id password you can just fire below commands to start wifi.

Step 1: Turn on the onboard WiFi of Odinub

ifup wlan0

Step 2: Install and test Tesseract

Tesseract is an optical character recognition engine for various operating systems. It is considered as the most accurate open-source engine.

Install Tesseract when the present working directory is root.

Fire the following command to install Tesseract:

sudo apt-get install tesseract-ocr

Step 3: Validate that Tesseract has been installed properly

tesseract -v

You will get the output as following that will show the version of the Tesseract installed on your system.

If you get an error instead as following then the Tesseract has not been installed properly on your system.

-bash: tesseract: command not found

You should go back to #Step 1 to install the Tesseract again if you get the above error.

Step 4: Test the Tesseract

Run the following command to test the Tesseract on the terminal.

tesseract /root/Desktop/ocr/tesseract.png stdout

For example we have created an image in paint in order to verify if the Tesseract is working properly or not.

You will get the output on the terminal as follows:

Step 5: The code to convert the text inside the image into string

Install the gedit on your system using the following command:

sudo apt-get install gedit

Create a new Python file using editor by following command:

gedit filename.py

To get started import the tesseract module as following:

import pytesseract

With PIL, we can read or write the images from most formats and for that the main module is Image. Import the Image module as following:

from PIL import Image

Further to covert the text inside the image to the string run the following command:

text = pytesseract.image_to_string('/root/Desktop/ocr/LP1_jpg')

Here we have taken a small application which provides the details of the vehicle owner on detection of the number plate of the vehicle using the OCR.

The output of it is as follows:

Note: To execute your code fire the following command.

python3 filename.py

Tesseract is best suited for situations with high resolution inputs where the foreground text is cleanly segmented from the background.

Using OCR many security purpose applications can be designed such as Vehicle Identification and Chasing System and many more. I hope the above article is helpful to develop a project using Tesseract.

OCR.py

Credits

Swapnil Trivedi

3 projects • 4 followers

Optical Character Recognition (OCR) Using Odinub

Things used in this project

Hardware components

Software apps and online services

Story

Step 0: Start WiFi on Odinub

Step 1: Turn on the onboard WiFi of Odinub

Step 2: Install and test Tesseract

Step 3: Validate that Tesseract has been installed properly

Step 4: Test the Tesseract

Step 5: The code to convert the text inside the image into string

Code

OCR.py

Credits

Swapnil Trivedi

Comments

Embed the widget on your own site

Optical Character Recognition (OCR) Using Odinub

Optical Character Recognition (OCR) Using Odinub

Things used in this project

Hardware components

Software apps and online services

Story

Step 0: Start WiFi on Odinub

Step 1: Turn on the onboard WiFi of Odinub

Step 2: Install and test Tesseract

Step 3: Validate that Tesseract has been installed properly

Step 4: Test the Tesseract

Step 5: The code to convert the text inside the image into string

Code

OCR.py

Credits

Swapnil Trivedi

Comments

Related channels and tags