Classification of Non-native Handwritten Characters Using Convolutional Neural Network
F. A. Mamun, S. A. H. Chowdhury, J. E. Giti, H. Sarker
TL;DR
This work tackles non-native handwriting recognition by building a tailored CNN trained on the Handwritten Isolated English Character (HIEC) dataset, comprising $16,496$ images from $260$ writers. Through an ablation study, the authors identify a five-convolutional-layer architecture with a single hidden layer and dropout as the best configuration, achieving $97.04\%$ test accuracy and outperforming several state-of-the-art models by up to $4.38\%$. The study highlights the impact of non-native handwriting diversity on HCR and demonstrates robust performance via targeted data augmentation and architecture design. The findings suggest practical potential for reliable non-native HCR and motivate future work on multilingual and semi-supervised extensions for broader applicability.
Abstract
The use of convolutional neural networks (CNNs) has accelerated the progress of handwritten character classification/recognition. Handwritten character recognition (HCR) has found applications in various domains, such as traffic signal detection, language translation, and document information extraction. However, the widespread use of existing HCR technology is yet to be seen as it does not provide reliable character recognition with outstanding accuracy. One of the reasons for unreliable HCR is that existing HCR methods do not take the handwriting styles of non-native writers into account. Hence, further improvement is needed to ensure the reliability and extensive deployment of character recognition technologies for critical tasks. In this work, the classification of English characters written by non-native users is performed by proposing a custom-tailored CNN model. We train this CNN with a new dataset called the handwritten isolated English character (HIEC) dataset. This dataset consists of 16,496 images collected from 260 persons. This paper also includes an ablation study of our CNN by adjusting hyperparameters to identify the best model for the HIEC dataset. The proposed model with five convolutional layers and one hidden layer outperforms state-of-the-art models in terms of character recognition accuracy and achieves an accuracy of $\mathbf{97.04}$%. Compared with the second-best model, the relative improvement of our model in terms of classification accuracy is $\mathbf{4.38}$%.
