Table of Contents
Fetching ...

Semantic-guided Adversarial Diffusion Model for Self-supervised Shadow Removal

Ziqi Zeng, Chen Zhao, Weiling Cai, Chenyu Dong

TL;DR

This work tackles unsupervised shadow removal by marrying semantic guidance with diffusion-based refinement in a two-stage framework. A coarse stage (SG-GAN) performs shadow generation and removal and constructs paired data via cycle-consistency, while a refinement stage (DBRM) employs an IR-SDE diffusion process to restore texture and reduce edge artifacts. A general-purpose Multi-modal Semantic Prompter (MSP) leverages CLIP image/text features to inject semantic priors, improving restoration quality across real and synthetic data. Across ISTD and AISTD datasets, the method achieves competitive results with state-of-the-art unsupervised approaches and shows robust gains in texture fidelity and boundary smoothness, with ablation studies confirming the importance of DBRM, MSP, and individual losses. The approach offers a practical, self-supervised solution that reduces reliance on paired data and enhances real-world shadow removal performance.

Abstract

Existing unsupervised methods have addressed the challenges of inconsistent paired data and tedious acquisition of ground-truth labels in shadow removal tasks. However, GAN-based training often faces issues such as mode collapse and unstable optimization. Furthermore, due to the complex mapping between shadow and shadow-free domains, merely relying on adversarial learning is not enough to capture the underlying relationship between two domains, resulting in low quality of the generated images. To address these problems, we propose a semantic-guided adversarial diffusion framework for self-supervised shadow removal, which consists of two stages. At first stage a semantic-guided generative adversarial network (SG-GAN) is proposed to carry out a coarse result and construct paired synthetic data through a cycle-consistent structure. Then the coarse result is refined with a diffusion-based restoration module (DBRM) to enhance the texture details and edge artifact at second stage. Meanwhile, we propose a multi-modal semantic prompter (MSP) that aids in extracting accurate semantic information from real images and text, guiding the shadow removal network to restore images better in SG-GAN. We conduct experiments on multiple public datasets, and the experimental results demonstrate the effectiveness of our method.

Semantic-guided Adversarial Diffusion Model for Self-supervised Shadow Removal

TL;DR

This work tackles unsupervised shadow removal by marrying semantic guidance with diffusion-based refinement in a two-stage framework. A coarse stage (SG-GAN) performs shadow generation and removal and constructs paired data via cycle-consistency, while a refinement stage (DBRM) employs an IR-SDE diffusion process to restore texture and reduce edge artifacts. A general-purpose Multi-modal Semantic Prompter (MSP) leverages CLIP image/text features to inject semantic priors, improving restoration quality across real and synthetic data. Across ISTD and AISTD datasets, the method achieves competitive results with state-of-the-art unsupervised approaches and shows robust gains in texture fidelity and boundary smoothness, with ablation studies confirming the importance of DBRM, MSP, and individual losses. The approach offers a practical, self-supervised solution that reduces reliance on paired data and enhances real-world shadow removal performance.

Abstract

Existing unsupervised methods have addressed the challenges of inconsistent paired data and tedious acquisition of ground-truth labels in shadow removal tasks. However, GAN-based training often faces issues such as mode collapse and unstable optimization. Furthermore, due to the complex mapping between shadow and shadow-free domains, merely relying on adversarial learning is not enough to capture the underlying relationship between two domains, resulting in low quality of the generated images. To address these problems, we propose a semantic-guided adversarial diffusion framework for self-supervised shadow removal, which consists of two stages. At first stage a semantic-guided generative adversarial network (SG-GAN) is proposed to carry out a coarse result and construct paired synthetic data through a cycle-consistent structure. Then the coarse result is refined with a diffusion-based restoration module (DBRM) to enhance the texture details and edge artifact at second stage. Meanwhile, we propose a multi-modal semantic prompter (MSP) that aids in extracting accurate semantic information from real images and text, guiding the shadow removal network to restore images better in SG-GAN. We conduct experiments on multiple public datasets, and the experimental results demonstrate the effectiveness of our method.
Paper Structure (17 sections, 25 equations, 8 figures, 5 tables)

This paper contains 17 sections, 25 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The shadow removal results of our method and other two GAN-based methods: G2R-ShadowNetliu2021shadow and Mask-ShadowGAN2019Mask. Gan-based methods have obvious shadow boundaries and artifacts.
  • Figure 2: Visualization of different domains and domain transitions. The red arrow represents the previous adversarial generation process, and the green arrow represents the diffusion generation process.
  • Figure 3: Overall pipeline of our method. At the coarse processing stage, the SG-GAN, which consists of S2F, F2F, and MSP, predicts the coarse shadow removal results ${\widetilde{R}}^2_s$. At the refined restoration stage, DBRM takes paired data $R_n$ and ${\widetilde{R}}^2_s$ from the previous stage as input, where the coarse result ${\widetilde{R}}^2_s$ is refined.
  • Figure 4: Real shadow images from ISTD and corresponding synthetic shadow images obtained from our generator $G_s$, the histograms show the inconsistencies in the intensity distribution between them.
  • Figure 5: Visualisation comparisons results on five real-world challenging samples from the ISTD dataset.
  • ...and 3 more figures