Structure-Informed Shadow Removal Networks

Yuhao Liu; Qing Guo; Lan Fu; Zhanghan Ke; Ke Xu; Wei Feng; Ivor W. Tsang; Rynson W. H. Lau

Structure-Informed Shadow Removal Networks

Yuhao Liu, Qing Guo, Lan Fu, Zhanghan Ke, Ke Xu, Wei Feng, Ivor W. Tsang, Rynson W. H. Lau

TL;DR

This work tackles shadow remnants by reframing shadow removal as a structure-level problem. It introduces StructNet, a two-stage network that first recovers shadow-free structure using MSFE and MFRA to guide subsequent image-level removal, and extends it to MStructNet to exploit multiple structure levels within a single pass. Extensive experiments on SRD, ISTD, and ISTD+ show that structure-informed guidance improves shadow-region RMSE and perceptual quality, while enabling stronger integration with existing methods. The proposed approach yields state-of-the-art results with competitive efficiency and can be plugged into existing shadow removal pipelines to boost performance. The multi-level variant further demonstrates that combining several structure levels at the feature level enhances restoration with minimal overhead.

Abstract

Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image mapping paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence, in this paper, we propose to remove shadows at the image structure level. Based on this idea, we propose a novel structure-informed shadow removal network (StructNet) to leverage the image-structure information to address the shadow remnant problem. Specifically, StructNet first reconstructs the structure information of the input image without shadows and then uses the restored shadow-free structure prior to guiding the image-level shadow removal. StructNet contains two main novel modules: (1) a mask-guided shadow-free extraction (MSFE) module to extract image structural features in a non-shadow-to-shadow directional manner, and (2) a multi-scale feature & residual aggregation (MFRA) module to leverage the shadow-free structure information to regularize feature consistency. In addition, we also propose to extend StructNet to exploit multi-level structure information (MStructNet), to further boost the shadow removal performance with minimum computational overheads. Extensive experiments on three shadow removal benchmarks demonstrate that our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to improve them further.

Structure-Informed Shadow Removal Networks

TL;DR

Abstract

Paper Structure (29 sections, 10 equations, 8 figures, 11 tables)

This paper contains 29 sections, 10 equations, 8 figures, 11 tables.

Introduction
Related Work
Shadow Removal
Image-structure in Vision Tasks
Structure-level Shadow Removal
Formulation of Structure-Level Shadow Removal
Empirical Studies
Shadow Removal at Different Structure Levels
Shadow Removal with Structure-level Guidance
Limitations of using the Vanilla UNet
StructNet
The MSFE Module
The MFRA Module
Configuration Details
Multi-level StructNets (MStructNet)
...and 14 more sections

Figures (8)

Figure 1: (a) State-of-the-art shadow removal methods (e.g., AEF fu2021auto)typically learn a direct shadow-to-shadow-free mapping and may often produce shadow remnants with color artifacts. (b) We propose to incorporate image-structure information into the shadow removal process. We visualize the features of approaches (a) and (b) in (c) and (d), respectively, which show that features of (d) are structured according to region homogeneity. (e) Results of original AEF and its structure-enhanced counterpart, where red arrows indicate the region with shadow remnants exist, and RMSE metric are shown for reference.
Figure 2: Shadow removal results at different structure levels. The $1$st row shows the original shadow image (a) and its structures (b)-(e) extracted by xu2012structure at four different structure levels (i.e., $l \in \{0.005, 0.015, 0.045, 0.1\}$). The $2$nd row shows the shadow removal results by feeding the shadow structures in the $1$st row to respective vanilla UNets. Image (f) represents the result of the image-level shadow removal, while images (g)-(j) are the results of structure-level shadow removal with $l>0.0$. The $3$rd row shows restoration results of our naive two-stage shadow removal network by feeding the restored shadow-free structures (i.e., the images at $2$nd row) into the second vanilla UNets.
Figure 3: Comparison of the image-level (i.e., $l=0.0$) and four structure-level shadow removal process with $l\in\{0.005, 0.015, 0.045,0.1\}$ on two public datasets (i.e., ISTD+ le2021physics and SRD qu2017deshadownet). We employ the root mean square error (RMSE) in the LAB color space as the evaluation metric to assess the shadow-removal performances in the shadow regions, non-shadow regions, and the whole (i.e., All) image, respectively.
Figure 4: Visualization and quantitative comparison of vanilla UNet and StructNet for structure-level shadow removal. (a) is the input shadow structure image, which is fed to the vanilla UNet and StructNet to obtain (b) and (d), respectively. Images (c) and (e) show the randomly sampled three feature channels produced by the $2$nd convolutional layer of the two networks. In addition, we also extract the features from the $2$nd convolution layer of the vanilla UNet and StructNet of all images in the ISTD+ test set. For each image, we calculate the absolute difference between the shadow and non-shadow regions in each feature channel and obtain the average difference across all channels. Image (f) shows the average feature differences of all images using the vanilla UNet (green points) and StructNet (blue points).
Figure 5: Pipeline of the proposed StructNet. (a) shows the structure-level shadow removal. (b) shows the image-level shadow removal with the assistance of predicted shadow-free structure from (a). (c) and (d) represent the mask-guided shadow-free extraction (MSFE) and the multi-scale feature & residual aggregation (MFRA) modules, respectively, in the architecture.
...and 3 more figures

Structure-Informed Shadow Removal Networks

TL;DR

Abstract

Structure-Informed Shadow Removal Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)