Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions
Yash Agrawal, Srinidhi Balasubramanian, Rahul Meena, Rohail Alam, Himanshu Malviya, Rohini P
TL;DR
This work targets OCR for Ashokan Brahmi inscriptions by leveraging transfer learning with three pre-trained CNNs (LeNet, VGG-16, MobileNet) on a dataset assembled from Indoskript. The methodology combines data augmentation, median-filter preprocessing, and projection-profile segmentation, culminating in a UI toolchain (Abhiñāna, Akṣarāntara, Kośa) for practical epigraphic digitization. Results show MobileNet with average pooling achieving the highest validation accuracy of 95.94% and a low loss of 0.129, outperforming the other models. The study demonstrates the feasibility of applying pre-trained CNNs to historical scripts, with implications for scalable preservation and digitization of epigraphical heritage, especially in resource-constrained contexts. Future work hints at incorporating more advanced DL techniques and attention mechanisms to broaden script coverage and robustness.
Abstract
This research paper delves into the development of an Optical Character Recognition (OCR) system for the recognition of Ashokan Brahmi characters using Convolutional Neural Networks. It utilizes a comprehensive dataset of character images to train the models, along with data augmentation techniques to optimize the training process. Furthermore, the paper incorporates image preprocessing to remove noise, as well as image segmentation to facilitate line and character segmentation. The study mainly focuses on three pre-trained CNNs, namely LeNet, VGG-16, and MobileNet and compares their accuracy. Transfer learning was employed to adapt the pre-trained models to the Ashokan Brahmi character dataset. The findings reveal that MobileNet outperforms the other two models in terms of accuracy, achieving a validation accuracy of 95.94% and validation loss of 0.129. The paper provides an in-depth analysis of the implementation process using MobileNet and discusses the implications of the findings. The use of OCR for character recognition is of significant importance in the field of epigraphy, specifically for the preservation and digitization of ancient scripts. The results of this research paper demonstrate the effectiveness of using pre-trained CNNs for the recognition of Ashokan Brahmi characters.
