Table of Contents
Fetching ...

SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

Runmin Cong, Yuchen Guan, Jinpeng Chen, Wei Zhang, Yao Zhao, Sam Kwong

TL;DR

This work tackles background color interference in shadow detection by modeling each shadow image as a superposition of a background layer and a shadow layer. It introduces SDDNet, which uses a Feature Separation and Recombination (FSR) module to split features into shadow-related and background-related components and a Shadow Style Filter (SSF) to enforce style-based disentanglement via Gram-matrix representations. The network is trained with a combination of shadow/ reconstructions losses and a style-consistency/diversity loss, achieving high accuracy with a real-time inference speed of 32 FPS. Across three public datasets, SDDNet sets new state-of-the-art BER results, with ablations showing the critical roles of FSR and SSF in robustly separating shadow from background information and reducing background color interference in shadow maps.

Abstract

Despite significant progress in shadow detection, current methods still struggle with the adverse impact of background color, which may lead to errors when shadows are present on complex backgrounds. Drawing inspiration from the human visual system, we treat the input shadow image as a composition of a background layer and a shadow layer, and design a Style-guided Dual-layer Disentanglement Network (SDDNet) to model these layers independently. To achieve this, we devise a Feature Separation and Recombination (FSR) module that decomposes multi-level features into shadow-related and background-related components by offering specialized supervision for each component, while preserving information integrity and avoiding redundancy through the reconstruction constraint. Moreover, we propose a Shadow Style Filter (SSF) module to guide the feature disentanglement by focusing on style differentiation and uniformization. With these two modules and our overall pipeline, our model effectively minimizes the detrimental effects of background color, yielding superior performance on three public datasets with a real-time inference speed of 32 FPS.

SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

TL;DR

This work tackles background color interference in shadow detection by modeling each shadow image as a superposition of a background layer and a shadow layer. It introduces SDDNet, which uses a Feature Separation and Recombination (FSR) module to split features into shadow-related and background-related components and a Shadow Style Filter (SSF) to enforce style-based disentanglement via Gram-matrix representations. The network is trained with a combination of shadow/ reconstructions losses and a style-consistency/diversity loss, achieving high accuracy with a real-time inference speed of 32 FPS. Across three public datasets, SDDNet sets new state-of-the-art BER results, with ablations showing the critical roles of FSR and SSF in robustly separating shadow from background information and reducing background color interference in shadow maps.

Abstract

Despite significant progress in shadow detection, current methods still struggle with the adverse impact of background color, which may lead to errors when shadows are present on complex backgrounds. Drawing inspiration from the human visual system, we treat the input shadow image as a composition of a background layer and a shadow layer, and design a Style-guided Dual-layer Disentanglement Network (SDDNet) to model these layers independently. To achieve this, we devise a Feature Separation and Recombination (FSR) module that decomposes multi-level features into shadow-related and background-related components by offering specialized supervision for each component, while preserving information integrity and avoiding redundancy through the reconstruction constraint. Moreover, we propose a Shadow Style Filter (SSF) module to guide the feature disentanglement by focusing on style differentiation and uniformization. With these two modules and our overall pipeline, our model effectively minimizes the detrimental effects of background color, yielding superior performance on three public datasets with a real-time inference speed of 32 FPS.
Paper Structure (21 sections, 21 equations, 5 figures, 2 tables)

This paper contains 21 sections, 21 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Some difficult cases in shadow detection. (a) The input images. (b) The ground truth shadow maps. (c) The predicted results of ECA fang2021robust. (d) The predicted results of MTMT-Net chen2020multi. (e) The predicted results of our SDDNet.
  • Figure 2: Architecture of the proposed SDDNet. Given an input image, SDDNet outputs the shadow map, background image, and reconstructed image in an end-to-end manner. Firstly, the backbone extracts integrated low-level and high-level features. Then, the proposed FSR module decomposes the features and produce shadow-related component, background-related component, and recombined features. In addition, the SSF module extracts style attributes and guide the feature disentanglement process. Finally, the low-level and high-level features are fused through the parallel decoder to generate three outputs (i.e., background image, shadow map, and reconstructed input image).
  • Figure 3: Structure of the SSF module. The Gram matrix is used to extract style attributes of the background-related component, the shadow-related component, and the recombined features. Based on the presence or absence of shadows, we aim to bring the style of the shadow-related component closer to that of the recombined features, while differentiating it with that of the background-related component.
  • Figure 4: Qualitative comparison between our SDDNet and existing state-of-the-art methods. (a) Input images. (b) Ground-truths. (c) The prediction of BDRAR zhu2018bidirectional. (d) The prediction of DSDNet zheng2019distraction. (e) The prediction of MTMT-Net chen2020multi. (f) The prediction of FDRNet zhu2021mitigating. (g) The prediction of ECA fang2021robust. (h) The prediction of CM-Net zhu2022single. (i) The prediction of our SDDNet.
  • Figure 5: The qualitative results of the ablation study. (a) Input images. (b) Ground-truths. (c) The prediction of Baseline. (d) The prediction of Baseline+FSR. (e) The prediction of Baseline+FSR+SSF.