Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-scale Feature Extraction
Rui-Yang Ju, KokSheik Wong, Jen-Shiun Chiang
TL;DR
This work tackles the inefficiency of GAN-based color document enhancement and binarization by introducing a three-stage framework that leverages Haar wavelet transform and normalization for multi-scale feature extraction. It trains separate channel-specific generators while sharing a discriminator and employs a dual-scale binarization strategy, all optimized with a WGAN-GP loss augmented by BCE and Dice terms. The approach achieves substantial reductions in both training and inference times while maintaining competitive Average-Score (ASM) performance compared to state-of-the-art methods. The results demonstrate practical potential for fast, reliable preprocessing in OCR and document analysis pipelines and lay groundwork for future integration with document understanding tasks.
Abstract
The outcome of text recognition for degraded color documents is often unsatisfactory due to interference from various contaminants. To extract information more efficiently for text recognition, document image enhancement and binarization are often employed as preliminary steps in document analysis. Training independent generative adversarial networks (GANs) for each color channel can generate images where shadows and noise are effectively removed, which subsequently allows for efficient text information extraction. However, employing multiple GANs for different color channels requires long training and inference times. To reduce both the training and inference times of these preliminary steps, we propose an efficient method based on multi-scale feature extraction, which incorporates Haar wavelet transformation and normalization to process document images before submitting them to GANs for training. Experiment results show that our proposed method significantly reduces both the training and inference times while maintaining comparable performances when benchmarked against the state-of-the-art methods. In the best case scenario, a reduction of 10% and 26% are observed for training and inference times, respectively, while maintaining the model performance at 73.79 of Average-Score metric. The implementation of this work is available at https://github.com/RuiyangJu/Efficient_Document_Image_Binarization.
