A Coding Information to Construct an Optical Character Recognition (OCR) App in Google Colab Utilizing OpenCV and Tesseract-OCR


Optical Character Recognition (OCR) is a strong expertise that converts photos of textual content into machine-readable content material. With the rising want for automation in information extraction, OCR instruments have turn into a vital a part of many purposes, from digitizing paperwork to extracting data from scanned photos. On this tutorial, we are going to construct an OCR app that runs effortlessly on Google Colab, leveraging instruments like OpenCV for picture processing, Tesseract-OCR for textual content recognition, NumPy for array manipulations, and Matplotlib for visualization. By the top of this information, you’ll be able to add a picture, preprocess it, extract textual content, and obtain the outcomes, all inside a Colab pocket book.

!apt-get set up -y tesseract-ocr
!pip set up pytesseract opencv-python numpy matplotlib

To arrange the OCR atmosphere in Google Colab, we first set up Tesseract-OCR, an open-source textual content recognition engine, utilizing apt-get. Additionally, we set up important Python libraries like pytesseract (for interfacing with Tesseract), OpenCV (for picture processing), NumPy (for numerical operations), and Matplotlib (for visualization).

import cv2
import pytesseract
import numpy as np
import matplotlib.pyplot as plt
from google.colab import recordsdata
from PIL import Picture

Subsequent, we import the required libraries for picture processing and OCR duties. OpenCV (cv2) is used for studying and preprocessing photos, whereas pytesseract gives an interface to the Tesseract OCR engine for textual content extraction. NumPy (np) helps with array manipulations, and Matplotlib (plt) visualizes processed photos. Google Colab’s recordsdata module permits customers to add photos, and PIL (Picture) facilitates picture conversions required for OCR processing.

uploaded = recordsdata.add()


filename = record(uploaded.keys())[0]

To course of a picture for OCR, we first have to add it to Google Colab. The recordsdata.add() perform from Google Colab’s recordsdata module permits customers to pick and add a picture file from their native system. The uploaded file is saved in a dictionary, with the filename as the important thing. We extract the filename utilizing record(uploaded.keys())[0], which permits us to entry and course of the uploaded picture within the subsequent steps.

def preprocess_image(image_path):
    picture = cv2.imread(image_path)
   
    grey = cv2.cvtColor(picture, cv2.COLOR_BGR2GRAY)
   
    _, thresh = cv2.threshold(grey, 150, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
   
    return thresh


processed_image = preprocess_image(filename)


plt.imshow(processed_image, cmap='grey')
plt.axis('off')
plt.present()

To enhance OCR accuracy, we apply a preprocessing perform that enhances picture high quality for textual content extraction. The preprocess_image() perform first reads the uploaded picture utilizing OpenCV (cv2.imread()) and converts it to grayscale utilizing cv2.cvtColor(), as grayscale photos are simpler for OCR. Subsequent, we apply binary thresholding with Otsu’s methodology utilizing cv2.threshold(), which helps distinguish textual content from the background by changing the picture right into a high-contrast black-and-white format. Lastly, the processed picture is displayed utilizing Matplotlib (plt.imshow()).

def extract_text(picture):
    pil_image = Picture.fromarray(picture)
   
    textual content = pytesseract.image_to_string(pil_image)
   
    return textual content


extracted_text = extract_text(processed_image)


print("Extracted Textual content:")
print(extracted_text)

The extract_text() perform performs OCR on the preprocessed picture. Since Tesseract-OCR requires a PIL picture format, we first convert the NumPy array (processed picture) right into a PIL picture utilizing Picture.fromarray(picture). Then, we cross this picture to pytesseract.image_to_string(), which extracts and returns the detected textual content. Lastly, the extracted textual content is printed, showcasing the OCR consequence from the uploaded picture.

with open("extracted_text.txt", "w") as f:
    f.write(extracted_text)


recordsdata.obtain("extracted_text.txt")

To make sure the extracted textual content is definitely accessible, we reserve it as a textual content file utilizing Python’s built-in file dealing with. The open(“extracted_text.txt”, “w”) command creates (or overwrites) a textual content file and writes the extracted OCR output into it. After saving the file, we use recordsdata.obtain(“extracted_text.txt”) to supply an automated obtain hyperlink.

In conclusion, by integrating OpenCV, Tesseract-OCR, NumPy, and Matplotlib, we’ve got efficiently constructed an OCR software that may course of photos and extract textual content in Google Colab. This workflow gives a easy but efficient option to convert scanned paperwork, printed textual content, or handwritten content material into digital textual content format. The preprocessing steps guarantee higher accuracy, and the power to save lots of and obtain outcomes makes it handy for additional evaluation.


Right here is the Colab Notebook. Additionally, don’t neglect to comply with us on Twitter and be a part of our Telegram Channel and LinkedIn Group. Don’t Overlook to affix our 80k+ ML SubReddit.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Leave a Reply

Your email address will not be published. Required fields are marked *