Optical character recognition (OCR) is a technology that recognizes text in images, such as scanned documents and photos. Perhaps you’ve taken a photo of a text just because you didn’t want to take notes or because taking a photo is faster than typing it. Fortunately, thanks to smartphones today, we can apply OCR so that we can copy the picture of text we took before without having to retype it.
We can do this in Python using a few lines of code. One of the most common OCR tools that are used is the Tesseract. Tesseract is an optical character recognition engine for various operating systems.
Tesseract runs on Windows, macOS and Linux platforms. It supports Unicode (UTF-8) and more than 100 languages. In this article, we will start with the Tesseract OCR installation process, and test the extraction of text in images.
The first step is to install the Tesseract. In order to use the Tesseract library, we need to install it on our system. If you’re using Ubuntu, you can simply use
apt-get to install Tesseract OCR:
For macOS users, we’ll be using Homebrew to install Tesseract.
For Windows, please see the Tesseract documentation.
Let’s begin by getting pyTesseract installed.
More on Python: 5 Ways to Write More Pythonic Code
After installation is completed, let’s move forward by applying Tesseract with Python. First, we import the dependencies.
I will use a simple image to test the usage of the Tesseract.
Let’s load this image and convert it to text.
Now, let’s see the result.
And this is the result.
The results obtained from the Tesseract are good enough for simple images. However, in the real world it is difficult to find images that are really simple, so I will add noise to test the performance of the Tesseract.
We’ll do the same process as before.
This is the result.
The result is, nothing. This means that tesseract cannot read words in images that have noise.
Next we’ll try to use a little image processing to eliminate noise in the image. Here I will use the OpenCV library. In this experiment, I’m using normalization, thresholding and image blur.
The result will be like this:
Now that the image is clean enough, we will try again with the same process as before. And this is the result.
As you can see, the results are in accordance with what we expect.
More on Python: 11 Best Python IDEs and Code Editors Available
With Tesseract, we can also do text localization and detection from images. We will first enter the dependencies that we need.
I will use a simple image like the example above to test the usage of the Tesseract.
Now, let’s load this image and extract the data.
This is different from what we did in the previous example. In the previous example we immediately changed the image into a string. In this example, we’ll convert the image into a dictionary.
The following results are the contents of the dictionary.
I will not explain the purpose of each value in the dictionary. Instead, we will use the left, top, width and height to draw a bounding box around the text along with the text itself. In addition, we will need a
conf key to determine the boundary of the detected text.
Now, we will extract the bounding box coordinates of the text region from the current result, and we’ll specify the confidence value that we want. Here, I’ll use the value
conf = 70. The code will look like this:
Now that everything is set, we can display the results using this code.
And this is the result.
Ultimately, the Tesseract is most suitable when building a document processing pipeline where images are scanned and processed. This works best for situations with high-resolution input, where foreground text is neatly segmented from the background.
For text localization and detection, there are several parameters that you can change, such as confident value limits. Or if you find it unattractive, you can change the thickness or color of the bounding box or text.
Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.
Note that any programming tips and code writing requires some knowledge of computer programming. Please, be careful if you do not know what you are doing…
Post expires at 11:09pm on Thursday April 27th, 2023