Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network
Lu Pan, Yu-Hsuan Huang, Hongxia Xie, Cheng Zhang, Hongwei Zhao, Hong-Han Shuai, Wen-Huang Cheng
TL;DR
This work tackles the challenge of specular highlights in real-world document images by introducing DocHR14K, a large-scale high-resolution dataset with nine lighting conditions across six document types, paired with a novel L2HRNet that uses a Highlight Location Prior and a diffusion-based module to remove highlights while preserving fine text details. The approach leverages a Laplacian Pyramid to treat low- and high-frequency components separately, enabling global highlight suppression and high-frequency texture restoration, guided by residual-based priors. Empirical results on DocHR14K, RD, and SHIQ show state-of-the-art performance with substantial gains in PSNR and reduced RMSE, and ablations confirm the contribution of each module. The work advances practical document image enhancement, improving readability and downstream task performance, with potential for future inpainting and faster, latent-space diffusion options.
Abstract
Reflective documents often suffer from specular highlights under ambient lighting, severely hindering text readability and degrading overall visual quality. Although recent deep learning methods show promise in highlight removal, they remain suboptimal for document images, primarily due to the lack of dedicated datasets and tailored architectural designs. To tackle these challenges, we present DocHR14K, a large-scale real-world dataset comprising 14,902 high-resolution image pairs across six document categories and various lighting conditions. To the best of our knowledge, this is the first high-resolution dataset for document highlight removal that captures a wide range of real-world lighting conditions. Additionally, motivated by the observation that the residual map between highlighted and clean images naturally reveals the spatial structure of highlight regions, we propose a simple yet effective Highlight Location Prior (HLP) to estimate highlight masks without human annotations. Building on this prior, we present the Location-Aware Laplacian Pyramid Highlight Removal Network (L2HRNet), which effectively removes highlights by leveraging estimated priors and incorporates diffusion module to restore details. Extensive experiments demonstrate that DocHR14K improves highlight removal under diverse lighting conditions. Our L2HRNet achieves state-of-the-art performance across three benchmark datasets, including a 5.01\% increase in PSNR and a 13.17\% reduction in RMSE on DocHR14K.
