Table of Contents
Fetching ...

Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network

Lu Pan, Yu-Hsuan Huang, Hongxia Xie, Cheng Zhang, Hongwei Zhao, Hong-Han Shuai, Wen-Huang Cheng

TL;DR

This work tackles the challenge of specular highlights in real-world document images by introducing DocHR14K, a large-scale high-resolution dataset with nine lighting conditions across six document types, paired with a novel L2HRNet that uses a Highlight Location Prior and a diffusion-based module to remove highlights while preserving fine text details. The approach leverages a Laplacian Pyramid to treat low- and high-frequency components separately, enabling global highlight suppression and high-frequency texture restoration, guided by residual-based priors. Empirical results on DocHR14K, RD, and SHIQ show state-of-the-art performance with substantial gains in PSNR and reduced RMSE, and ablations confirm the contribution of each module. The work advances practical document image enhancement, improving readability and downstream task performance, with potential for future inpainting and faster, latent-space diffusion options.

Abstract

Reflective documents often suffer from specular highlights under ambient lighting, severely hindering text readability and degrading overall visual quality. Although recent deep learning methods show promise in highlight removal, they remain suboptimal for document images, primarily due to the lack of dedicated datasets and tailored architectural designs. To tackle these challenges, we present DocHR14K, a large-scale real-world dataset comprising 14,902 high-resolution image pairs across six document categories and various lighting conditions. To the best of our knowledge, this is the first high-resolution dataset for document highlight removal that captures a wide range of real-world lighting conditions. Additionally, motivated by the observation that the residual map between highlighted and clean images naturally reveals the spatial structure of highlight regions, we propose a simple yet effective Highlight Location Prior (HLP) to estimate highlight masks without human annotations. Building on this prior, we present the Location-Aware Laplacian Pyramid Highlight Removal Network (L2HRNet), which effectively removes highlights by leveraging estimated priors and incorporates diffusion module to restore details. Extensive experiments demonstrate that DocHR14K improves highlight removal under diverse lighting conditions. Our L2HRNet achieves state-of-the-art performance across three benchmark datasets, including a 5.01\% increase in PSNR and a 13.17\% reduction in RMSE on DocHR14K.

Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network

TL;DR

This work tackles the challenge of specular highlights in real-world document images by introducing DocHR14K, a large-scale high-resolution dataset with nine lighting conditions across six document types, paired with a novel L2HRNet that uses a Highlight Location Prior and a diffusion-based module to remove highlights while preserving fine text details. The approach leverages a Laplacian Pyramid to treat low- and high-frequency components separately, enabling global highlight suppression and high-frequency texture restoration, guided by residual-based priors. Empirical results on DocHR14K, RD, and SHIQ show state-of-the-art performance with substantial gains in PSNR and reduced RMSE, and ablations confirm the contribution of each module. The work advances practical document image enhancement, improving readability and downstream task performance, with potential for future inpainting and faster, latent-space diffusion options.

Abstract

Reflective documents often suffer from specular highlights under ambient lighting, severely hindering text readability and degrading overall visual quality. Although recent deep learning methods show promise in highlight removal, they remain suboptimal for document images, primarily due to the lack of dedicated datasets and tailored architectural designs. To tackle these challenges, we present DocHR14K, a large-scale real-world dataset comprising 14,902 high-resolution image pairs across six document categories and various lighting conditions. To the best of our knowledge, this is the first high-resolution dataset for document highlight removal that captures a wide range of real-world lighting conditions. Additionally, motivated by the observation that the residual map between highlighted and clean images naturally reveals the spatial structure of highlight regions, we propose a simple yet effective Highlight Location Prior (HLP) to estimate highlight masks without human annotations. Building on this prior, we present the Location-Aware Laplacian Pyramid Highlight Removal Network (L2HRNet), which effectively removes highlights by leveraging estimated priors and incorporates diffusion module to restore details. Extensive experiments demonstrate that DocHR14K improves highlight removal under diverse lighting conditions. Our L2HRNet achieves state-of-the-art performance across three benchmark datasets, including a 5.01\% increase in PSNR and a 13.17\% reduction in RMSE on DocHR14K.

Paper Structure

This paper contains 23 sections, 7 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Illustration of image capturing. (a) Vertical shooting under a color temperature of 5600K. (b) Diffuse image captured under purple lighting by 15 degrees. (c) Corresponding highlight image captured by adjusting the angle of the linear polarizer. (d) Illustration of collection in the laboratory environment. $I$ is the light from the source and $I_p$ denotes the polarized light. (e) Illustration of collection in the daily living environment. $I_r$ represents residual incident light, and $B$ denotes ambient light.
  • Figure 2: Data distribution of DocHR14K.
  • Figure 3: Framework of the proposed L2HRNet. The input image $I_0 \in \mathbb{R}^{h \times w \times 3}$ is first decomposed into high- and low-frequency bands using the Laplacian pyramid. Red arrows: For the low-frequency component $I_D \in \mathbb{R}^{\frac{h}{2^D} \times \frac{w}{2^D} \times c}$, highlight location prior is incorporated to achieve global highlight removal. Brown arrows: For the high-frequency component $h_{D-1}\in \mathbb{R}^{\frac{h}{2^{D-1}} \times \frac{w}{2^{D-1}} \times c}$, the diffusion-based enhancement module, is applied to restore fine details, such as text edges. Purple arrows: Convolution layers, residual blocks, and a texture recovery module further enhance finer details in other high-frequency components (e.g., $h_{D-2}, \dots$). The final highlight-free image $\hat{I}_0 \in \mathbb{R}^{h \times w \times 3}$ is reconstructed by merging the processed high- and low-frequency outputs. In the image example, $D$ is set to 2.
  • Figure 4: Visualization of the highlight residual prior on two document images. The top row presents sample from the RD dataset, while the bottom row shows sample from the SHIQ dataset. The residual map highlights regions of specular reflection, and the corresponding binary mask (Otsu Mask) shows strong structural alignment with ground-truth masks.
  • Figure 5: Qualitative comparisons of document highlight removal on the DocHR14K dataset. From left to right: the input highlight image, the estimated results of (a) JSHDR fu2021multi, (b) TSHRNet fu2023towards, (c) DHAN-SHR guo2024dual, (d) DocShadowNet li2023high, ours, and the ground truth image, respectively. Zoom in for the best view.
  • ...and 5 more figures