Manuscript Title:

A COMPLETE OCR SYSTEM USING CNN - HMM HYBRID APPROACH FOR PRINTED MEITEI/MEETEI SCRIPT DOCUMENTS

Author:

YANGLEM LOIJING KHOMBA KHUMAN, H. MAMATA DEVI

DOI Number:

DOI:10.17605/OSF.IO/EUJYV

Published : 2022-11-10

About the author(s)

1. YANGLEM LOIJING KHOMBA KHUMAN - Research Scholar, Department of Computer Science, Manipur University, Canchipur, Imphal West, Manipur.
2. H. MAMATA DEVI - Assistant Professor, Department of Computer Science, Manipur University, Canchipur, Imphal West, Manipur.

Full Text : PDF

Abstract

Scanned document image is transformed into a text document that can be edited using optical character recognition (OCR). Character recognition in Meitei/Meetei script is crucial because it has numerous uses including document digitization, bank data processing and mailing system automation. Hence various approaches have been presented for character recognition but these approaches face difficulty in recognition due to overlapping nature of Meitei/Meetei script characters. Also, these character recognition models were having sequential data handling problem due to the absence of dynamic temporal model in training process that leads to low recognition rate of Meitei/Meetei script characters. To solve these issues a novel Hybrid CNN-HMM Recognition Approach has been proposed in which HMM is used as a dynamic temporal model to train CNN thereby, eliminate the sequential data handling issue while recognizing characters of Meitei/Meetei script. Also, CNN and HMM were utilized to deal with character appearance fluctuations and provide temporal modelling in recognition of characters along with effective extraction of features from each scanned Meitei/Meetei script image with high recognition rate. The result obtained shows that the proposed hybrid model has high precision, sensitivity, recall, accuracy and F-measure when implemented in Python and are compared to other existing methodologies.


Keywords

Graphics separation, Skew correction, Horizontal projection profile, vertical projection profile, CNN, Hidden Markov Model (HMM).