Table of Contents
Fetching ...

ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal

Zhuohao Li, Guoyang Xie, Guannan Jiang, Zhichao Lu

TL;DR

ShadowMaskFormer addresses shadow removal by integrating shadow information at the very start of processing, using Mask Augmented Patch Embedding (MAPE) to bias patch embeddings toward shadow regions. The method uses two complementary binarizations of the shadow mask to compute a shadow-enhanced embedding and applies a lightweight convolutional projection before passing to a vision transformer backbone, enabling accurate restoration with only about 2.2MB of parameters. Empirical results on ISTD, ISTD+, and SRD show state-of-the-art or competitive performance with strong efficiency and robustness to mask quality, including generalization to unseen shadow scenarios. This work highlights the practical impact of leveraging shadow information early in patch embeddings, offering a scalable and effective direction for shadow-aware vision transformers.

Abstract

Transformer recently emerged as the de facto model for computer vision tasks and has also been successfully applied to shadow removal. However, these existing methods heavily rely on intricate modifications to the attention mechanisms within the transformer blocks while using a generic patch embedding. As a result, it often leads to complex architectural designs requiring additional computation resources. In this work, we aim to explore the efficacy of incorporating shadow information within the early processing stage. Accordingly, we propose a transformer-based framework with a novel patch embedding that is tailored for shadow removal, dubbed ShadowMaskFormer. Specifically, we present a simple and effective mask-augmented patch embedding to integrate shadow information and promote the model's emphasis on acquiring knowledge for shadow regions. Extensive experiments conducted on the ISTD, ISTD+, and SRD benchmark datasets demonstrate the efficacy of our method against state-of-the-art approaches while using fewer model parameters.g fewer model parameters. Our implementation is available at https://github.com/lizhh268/ShadowMaskFormer.

ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal

TL;DR

ShadowMaskFormer addresses shadow removal by integrating shadow information at the very start of processing, using Mask Augmented Patch Embedding (MAPE) to bias patch embeddings toward shadow regions. The method uses two complementary binarizations of the shadow mask to compute a shadow-enhanced embedding and applies a lightweight convolutional projection before passing to a vision transformer backbone, enabling accurate restoration with only about 2.2MB of parameters. Empirical results on ISTD, ISTD+, and SRD show state-of-the-art or competitive performance with strong efficiency and robustness to mask quality, including generalization to unseen shadow scenarios. This work highlights the practical impact of leveraging shadow information early in patch embeddings, offering a scalable and effective direction for shadow-aware vision transformers.

Abstract

Transformer recently emerged as the de facto model for computer vision tasks and has also been successfully applied to shadow removal. However, these existing methods heavily rely on intricate modifications to the attention mechanisms within the transformer blocks while using a generic patch embedding. As a result, it often leads to complex architectural designs requiring additional computation resources. In this work, we aim to explore the efficacy of incorporating shadow information within the early processing stage. Accordingly, we propose a transformer-based framework with a novel patch embedding that is tailored for shadow removal, dubbed ShadowMaskFormer. Specifically, we present a simple and effective mask-augmented patch embedding to integrate shadow information and promote the model's emphasis on acquiring knowledge for shadow regions. Extensive experiments conducted on the ISTD, ISTD+, and SRD benchmark datasets demonstrate the efficacy of our method against state-of-the-art approaches while using fewer model parameters.g fewer model parameters. Our implementation is available at https://github.com/lizhh268/ShadowMaskFormer.
Paper Structure (17 sections, 11 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Existing methods (such as CRFormer wan2022crformer and ShadowFormer guo2023shadowformer opt for vanilla patch embedding and focus on designing sophisticated modules to incorporate shadow information within the transformer blocks. In contrast, (b) our method proposes to incorporate shadow information during the early processing stage and present a simple yet effective patch embedding module, dubbed MAPE, tailored for shadow removal. (c) Empirically, we demonstrate that our method leads to state-of-the-art performance on the SRD dataset with significantly lower computational complexity. For the shadow removal task, MAE is the mean absolute error computed in the LAB color space, where lower MAE indicates better performance.
  • Figure 2: An Overview of our ShadowMaskFormer framework with the proposed mask augmented patch embedding (MAPE). First, MAPE takes the shadow image $\textbf{I}_s$ and its corresponding shadow mask M as inputs. Then, different processing techniques are applied to generate refined shadow mask $\textbf{M}_s$ and $\textbf{M}_p$ and to enhance the shadow region pixels. Subsequently, N transformer Blocks learn contextual information from MAPE.
  • Figure 3: Visual comparison among various approaches on examples from the ISTD dataset. We show the input image, the ground truth, and the results from DC-ShadowNet, SP+M-Net, DSC, Fu et al., ShadowFormer, and our method, respectively from left to right. Heatmaps depict the differences between results and ground truth. Zoom in for details.
  • Figure 4: Visual comparison among various approaches on examples from the SRD dataset. We show the input image, the ground truth, and the results from DC-ShadowNet, DHAN, DSC, Fu et al., ShadowDiffusion, and our method, respectively from left to right.
  • Figure 5: Our shadow removal results with inaccurate masks from SRD dataset. From left to right: shadow image, inaccurate shadow mask, and the results using our proposed method. It is clearly shown that even with a highly inaccurate shadow mask, our method can still effectively remove the shadows, demonstrating the robustness of MAPE.
  • ...and 2 more figures