Multilanguage OCR

Anubhav Singh

Kolkata, West Bengal

0 0

0 Collaborators

An OCR tool to extract text in multiple languages automatically using the Tesseract library by Google developed on Intel Optimized Python. The project allows adding own sets of handwritings or training models which are not previously available to facilitate recognition of text from new handwritings. ...learn more

Project status: Published/In Market

Artificial Intelligence, Graphics and Media

Groups
Student Developers for AI, Artificial Intelligence India

Intel Technologies
Intel Python

Code Samples [1]

Overview / Usage

This OCR, built on top with tesseract is presently able to extract text in English, Hindi and Bengali with a 70% accuracy. I wish to expand this to cover the other Indian languages.

Methodology / Approach

First, the text found in the images is broken down into bounded boxes using OpenCV and then for each box found, a CNN predicts the alphabet matched. For each language, a different model is used.

Technologies Used

Intel Optimized Python
OpenCV
Tesseract

Repository

https://github.com/xprilion/optical-character-reader

Comments (0)

You have disabled JavaScript

We are sorry, but without JavaScript we are currently unable to display the latest activity feed. Please, enable Javascript in your browser.

Multilanguage OCR

Anubhav Singh

Overview / Usage

Methodology / Approach

Technologies Used

Repository

Login to continue

This action requires you to be logged in.

Thanks for voting. Please leave a comment.

Multilanguage OCR

Anubhav Singh

Overview / Usage

Methodology / Approach

Technologies Used

Repository

Login to continue

This action requires you to be logged in.