Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Jinting Luo; Ru Li; Chengzhi Jiang; Xiaoming Zhang; Mingyan Han; Ting Jiang; Haoqiang Fan; Shuaicheng Liu

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Jinting Luo, Ru Li, Chengzhi Jiang, Xiaoming Zhang, Mingyan Han, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

TL;DR

Diff-Shadow tackles shadow removal by fusing diffusion-based synthesis with global context through a parallel UNet architecture. A local patch-based diffusion branch is guided by a global low-resolution restoration branch via the Reweight Cross Attention, and a Global-guided Sampling Strategy ensures illumination consistency across patches. On ISTD, ISTD+, and SRD, it sets new state-of-the-art PSNR/SSIM while mitigating boundary artifacts and illumination discrepancies. The approach enables size-agnostic shadow removal and robust performance, at the cost of longer inference time and reliance on accurate shadow masks.

Abstract

We propose Diff-Shadow, a global-guided diffusion model for shadow removal. Previous transformer-based approaches can utilize global information to relate shadow and non-shadow regions but are limited in their synthesis ability and recover images with obvious boundaries. In contrast, diffusion-based methods can generate better content but they are not exempt from issues related to inconsistent illumination. In this work, we combine the advantages of diffusion models and global guidance to achieve shadow-free restoration. Specifically, we propose a parallel UNets architecture: 1) the local branch performs the patch-based noise estimation in the diffusion process, and 2) the global branch recovers the low-resolution shadow-free images. A Reweight Cross Attention (RCA) module is designed to integrate global contextual information of non-shadow regions into the local branch. We further design a Global-guided Sampling Strategy (GSS) that mitigates patch boundary issues and ensures consistent illumination across shaded and unshaded regions in the recovered image. Comprehensive experiments on datasets ISTD, ISTD+, and SRD have demonstrated the effectiveness of Diff-Shadow. Compared to state-of-the-art methods, our method achieves a significant improvement in terms of PSNR, increasing from 32.33dB to 33.69dB on the ISTD dataset.

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

TL;DR

Abstract

Paper Structure (18 sections, 10 equations, 14 figures, 9 tables, 2 algorithms)

This paper contains 18 sections, 10 equations, 14 figures, 9 tables, 2 algorithms.

Introduction
Related Works
Method
Overall Architecture
Reweight Cross Attention
Global-guided Samping Strategy
Optimization
Experiments
Experimental Setups
Comparison with the State-of-the-art
Ablation Studies
Conclusion
Acknowledgments
Diff-Shadow: Global-guided Diffusion Model for Shadow Removal Supplementary
Implementation Details
...and 3 more sections

Figures (14)

Figure 1: (a) shows the result of ShadowFormer guo2023shadowformer, which suffers from residual shadow artifacts due to the limited modeling ability, while the result of (b) ShadowDiffusion guo2023shadowdiffusion shows obvious illumination inconsistency across the images because it cannot exploit the global information. (c) exhibits the proposed Diff-Shadow, which generates high-quality shadow removal results that maintain illumination consistency and are free from block boundary artifacts through the design of a parallel network structure and a novel global-guided sampling strategy.
Figure 2: An overview of the Diff-Shadow. The local branch of Parallel UNets performs patch diffusion noise estimation using patches of the intermediate variable $\mathbf{x}_t$, the shadow image $\tilde{\mathbf{x}}$, and the shadow mask $\mathbf{x}_m$. The latter two images are also down-sampled as inputs of the global UNet, which constructs the low-resolution shadow-free image on the one hand and provides the global contextual information of non-shaded regions $\mathbf{x}_G$ into the local branch ${\mathbf{x}_L}^{(i)}$ using the proposed Reweight Cross Attention (RCA) module on the other hand. After estimating the noises of local patches, the Global-guided Sampling Strategy (GSS) works to construct the noise distribution for the whole image. $t_1$ and $t_2$ represent functions corresponding to the step $t$.
Figure 3: The illustration of noise merge for overlapping patches. $N_p$ represents the number of overlapping patches for each pixel. $r$ and $R$ are the step size and the patch size.
Figure 4: Qualitative comparisons on ISTD+. The values represent the 'PSNR/SSIM'.
Figure 5: Visual examples of results and error maps for ablation study with different sampling strategies, along with corresponding results from Table \ref{['table:Sampling_ablation']}.
...and 9 more figures

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

TL;DR

Abstract

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Authors

TL;DR

Abstract

Table of Contents

Figures (14)