High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
Zinuo Li, Xuhang Chen, Chi-Man Pun, Xiaodong Cun
TL;DR
This work tackles high‑resolution document shadow removal by introducing SD7K, a large real‑world dataset with over 7k shadow/shadow‑free pairs under diverse lighting, and FSENet, a frequency‑aware network that decouples processing across frequency bands via a Laplacian Pyramid. The low‑frequency deshading path uses Dimension‑Aware Transformer blocks and a Tri‑layer Attention Alignment module to correct illumination, while a high‑frequency restoration path learns contours to recover fine details, guided by a loss combining smoothly weighted L1 and SSIM terms. The combination of a large, varied dataset and a frequency‑aware architecture yields state‑of‑the‑art results on SD7K and existing benchmarks, with ablations validating the contributions of LP depth, DAT/DFE/TAA, and the high‑frequency contour module. This work has practical impact for improving readability and downstream document understanding tasks, particularly in real‑world capture scenarios where shadows are unavoidable, albeit at the cost of higher computation and non‑real‑time performance on edge devices.
Abstract
Shadows often occur when we capture the documents with casual equipment, which influences the visual quality and readability of the digital copies. Different from the algorithms for natural shadow removal, the algorithms in document shadow removal need to preserve the details of fonts and figures in high-resolution input. Previous works ignore this problem and remove the shadows via approximate attention and small datasets, which might not work in real-world situations. We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network. As for the dataset, we acquire over 7k couples of high-resolution (2462 x 3699) images of real-world document pairs with various samples under different lighting circumstances, which is 10 times larger than existing datasets. As for the design of the network, we decouple the high-resolution images in the frequency domain, where the low-frequency details and high-frequency boundaries can be effectively learned via the carefully designed network structure. Powered by our network and dataset, the proposed method clearly shows a better performance than previous methods in terms of visual quality and numerical results. The code, models, and dataset are available at: https://github.com/CXH-Research/DocShadow-SD7K
