Optical Character Recognition for Printed Hindi Text in Devnagari using Soft-Computing Technique

D. Yadav, A.K. Sharma, and J.P. Gupta (India)

Keywords

OCR, Preprocessing, segmentation, Feature Vector, Classification, Artificial Neural Network (ANN).

Abstract

In this paper, we present an OCR for printed Hindi text in devnagari script. Text written in Devnagari script, there is no separation between the characters. Hindi is one of the most spoken language in India. About 300 million people speak Hindi in India. One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Preprocessing task considered in this paper is conversion of gray scale images to binary images, image rectification, and segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation in this paper which are recognized by neural classifier. We have used three feature extraction techniques namely, histogram of projection based on mean distance, histogram of projection based on pixel value and Vertical Zero crossing . These feature extraction techniques are very much powerful to extract feature of even distorted characters . A back-propagation neural network with two hidden layer is used to create a character recognition system. The system is trained and evaluated with printed text. A performance of approximately 90% correct recognition is achieved.

Important Links:



Go Back