DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

Ziyang Zhou; Yingtie Lei; Xuhang Chen; Shenghong Luo; Wenjun Zhang; Chi-Man Pun; Zhen Wang

DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

Ziyang Zhou, Yingtie Lei, Xuhang Chen, Shenghong Luo, Wenjun Zhang, Chi-Man Pun, Zhen Wang

TL;DR

Shadows in smartphone-captured documents hinder readability and downstream analysis. The paper introduces DocDeshadower, a frequency-aware Transformer that uses a Laplacian Pyramid to separate shadows into low- and high-frequency bands, applying an Attention-Aggregation Network for low-frequency color correction and a Gated Multi-scale Fusion Transformer for high-frequency edge refinement. The approach is optimized with a joint loss $L_{total}=L_{MSE}+\lambda L_{SSIM}$ ($\lambda=0.2$), leveraging $L_{MSE}=\sum (x_i-y_i)^2$ and $L_{SSIM}$ to balance pixel accuracy and structural similarity. Experiments on Jung and Kligler datasets show state-of-the-art performance in PSNR, SSIM, and RMSE, indicating improved shadow removal while preserving document content and readability. This work offers practical gains for document analysis pipelines and OCR in real-world scanning scenarios by enabling robust shadow removal across multiple frequency bands.

Abstract

Shadows in scanned documents pose significant challenges for document analysis and recognition tasks due to their negative impact on visual quality and readability. Current shadow removal techniques, including traditional methods and deep learning approaches, face limitations in handling varying shadow intensities and preserving document details. To address these issues, we propose DocDeshadower, a novel multi-frequency Transformer-based model built upon the Laplacian Pyramid. By decomposing the shadow image into multiple frequency bands and employing two critical modules: the Attention-Aggregation Network for low-frequency shadow removal and the Gated Multi-scale Fusion Transformer for global refinement. DocDeshadower effectively removes shadows at different scales while preserving document content. Extensive experiments demonstrate DocDeshadower's superior performance compared to state-of-the-art methods, highlighting its potential to significantly improve document shadow removal techniques. The code is available at https://github.com/leiyingtie/DocDeshadower.

DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

TL;DR

(

), leveraging

and

to balance pixel accuracy and structural similarity. Experiments on Jung and Kligler datasets show state-of-the-art performance in PSNR, SSIM, and RMSE, indicating improved shadow removal while preserving document content and readability. This work offers practical gains for document analysis pipelines and OCR in real-world scanning scenarios by enabling robust shadow removal across multiple frequency bands.

Abstract

Paper Structure (22 sections, 8 equations, 4 figures, 3 tables)

This paper contains 22 sections, 8 equations, 4 figures, 3 tables.

INTRODUCTION
RELATED WORK
Natural Image Shadow Removal
Document Shadow Removal
Foundational Techniques for DocDeshadower
METHODOLOGY
Overview
Attention-Aggregation Network
Gated Multi-scale Fusion Transformer
Objective Functions
EXPERIMENTS
Experimental Setup
Datasets and Preprocessing
Evaluation Metrics
Implementation Details
...and 7 more sections

Figures (4)

Figure 1: The figure illustrates the comparison between our method and several state-of-the-art methods in document shadows removal.
Figure 2: The proposed DocDeshadower architecture.
Figure 3: The proposed Gated Multi-scale Fusion Transformer architecture.
Figure 4: Visual results of different document shadow removal methodologies.

DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

TL;DR

Abstract

DocDeshadower: Frequency-Aware Transformer for Document Shadow Removal

Authors

TL;DR

Abstract

Table of Contents

Figures (4)