Table of Contents
Fetching ...

Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection

Alhossien Waly, Bassant Tarek, Ali Feteha, Rewan Yehia, Gasser Amr, Ahmed Fares

TL;DR

This work targets Arabic Handwritten Text Recognition (AHTR) by coupling a line-segmentation module based on Differentiable Binarization and Adaptive Scale Fusion (DBNet++) with a CNN-BLSTM-CTC OCR engine. The pipeline is trained on the Arabic Multi-Fonts Dataset (AMFDS) and employs a synthetic line generator to create diverse, long-form sentences, enabling end-to-end training. The approach achieves high word- and sentence-level recognition rates, including CRR of $99.20\%$ and WRR of $93.75\%$ on single-word samples (7–10 characters) and CRR of $83.76\%$ for sentences, establishing a new benchmark for AHTR while highlighting that longer text sequences pose recognition challenges. Overall, the work demonstrates that accurate line segmentation coupled with sequence-modeling recognition can robustly handle Arabic handwriting for document digitization and information extraction.

Abstract

The problem of converting images of text into plain text is a widely researched topic in both academia and industry. Arabic handwritten Text Recognation (AHTR) poses additional challenges due to diverse handwriting styles and limited labeled data. In this paper we present a complete OCR pipeline that starts with line segmentation using Differentiable Binarization and Adaptive Scale Fusion techniques to ensure accurate detection of text lines. Following segmentation, a CNN-BiLSTM-CTC architecture is applied to recognize characters. Our system, trained on the Arabic Multi-Fonts Dataset (AMFDS), achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters, along with a CRR of 83.76% for sentences. These results demonstrate the system's strong performance in handling Arabic scripts, establishing a new benchmark for AHTR systems.

Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection

TL;DR

This work targets Arabic Handwritten Text Recognition (AHTR) by coupling a line-segmentation module based on Differentiable Binarization and Adaptive Scale Fusion (DBNet++) with a CNN-BLSTM-CTC OCR engine. The pipeline is trained on the Arabic Multi-Fonts Dataset (AMFDS) and employs a synthetic line generator to create diverse, long-form sentences, enabling end-to-end training. The approach achieves high word- and sentence-level recognition rates, including CRR of and WRR of on single-word samples (7–10 characters) and CRR of for sentences, establishing a new benchmark for AHTR while highlighting that longer text sequences pose recognition challenges. Overall, the work demonstrates that accurate line segmentation coupled with sequence-modeling recognition can robustly handle Arabic handwriting for document digitization and information extraction.

Abstract

The problem of converting images of text into plain text is a widely researched topic in both academia and industry. Arabic handwritten Text Recognation (AHTR) poses additional challenges due to diverse handwriting styles and limited labeled data. In this paper we present a complete OCR pipeline that starts with line segmentation using Differentiable Binarization and Adaptive Scale Fusion techniques to ensure accurate detection of text lines. Following segmentation, a CNN-BiLSTM-CTC architecture is applied to recognize characters. Our system, trained on the Arabic Multi-Fonts Dataset (AMFDS), achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters, along with a CRR of 83.76% for sentences. These results demonstrate the system's strong performance in handling Arabic scripts, establishing a new benchmark for AHTR systems.

Paper Structure

This paper contains 14 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The figure shows the pipeline of the proposed solution starting with DBnet++[\ref{['11']}] architecture for Line Segmentation of the Arabic Handwritten text, then Binarizing each detected line. The final step is feeding the Binarized lines into the CNN network followed by Bidirectional-LSTM and decoding the output via Connectionist Temporal Classification "CTC"
  • Figure 2: AMFDS Dataset [\ref{['28']}] raw data of words
  • Figure 4: Visualizing Line Segmentation Results
  • Figure 5: Solid sample
  • Figure 6: Salted sample
  • ...and 1 more figures