Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types
AKM Shahariar Azad Rabby, Hasmot Ali, Md. Majedul Islam, Sheikh Abujar, Fuad Rahman
TL;DR
This work presents a comprehensive Bengali OCR system that jointly reconstructs document layout and extracts text across diverse document types, including computer-composed, letterpress, typewriter, and handwritten sources. It combines a data-rich corpus with specialized word-segmentation models, a self-attentional VGG-based character recognizer, and a rule-based layout module to accurately restore paragraphs, tables, lists, images, and other elements. The deployment pipeline emphasizes scalability and speed via Apache Kafka, ONNX, and Triton, achieving fast per-page processing across document types and demonstrating strong accuracy improvements over existing Bangla OCR baselines. Collectively, these innovations advance Bengali text digitization, enabling robust, multi-domain OCR with layout preservation and multi-element extraction.
Abstract
This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents. The system handles static and dynamic handwritten inputs, recognizing various writing styles. Furthermore, it has the ability to recognize compound characters in Bengali. Extensive data collection efforts provide a diverse corpus, while advanced technical components optimize character and word recognition. Additional contributions include image, logo, signature and table recognition, perspective correction, layout reconstruction, and a queuing module for efficient and scalable processing. The system demonstrates outstanding performance in efficient and accurate text extraction and analysis.
