Table of Contents
Fetching ...

NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

Yuki Kondo, Riku Miyata, Fuma Yasue, Taito Naruki, Norimichi Ukita

TL;DR

The paper analyzes ShadowFormer for NTIRE 2023 Shadow Removal and introduces five enhancements to tackle misalignment and shadow-mask scarcity: homography-based image alignment, perceptual quality losses, SASMA for semi-automatic shadow masking, joint detection–removal training, and CutShadow augmentation. The integrated pipeline emphasizes perceptual fidelity and contextual consistency, achieving competitive metrics ($0.196$ LPIPS and $7.44$ MOS) while addressing practical misalignment seen in real-world data. Although PSNR/SIM metrics may drop on misaligned GT, the approach improves shadow removal quality and scene structure preservation, illustrating robust performance under challenging conditions. These contributions advance shadow removal in realistically misaligned data and offer practical strategies for perceptual optimization and data-efficient learning.

Abstract

In this paper, we analyze and discuss ShadowFormer in preparation for the NTIRE2023 Shadow Removal Challenge [1], implementing five key improvements: image alignment, the introduction of a perceptual quality loss function, the semi-automatic annotation for shadow detection, joint learning of shadow detection and removal, and the introduction of new data augmentation technique "CutShadow" for shadow removal. Our method achieved scores of 0.196 (3rd out of 19) in LPIPS and 7.44 (4th out of 19) in the Mean Opinion Score (MOS).

NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

TL;DR

The paper analyzes ShadowFormer for NTIRE 2023 Shadow Removal and introduces five enhancements to tackle misalignment and shadow-mask scarcity: homography-based image alignment, perceptual quality losses, SASMA for semi-automatic shadow masking, joint detection–removal training, and CutShadow augmentation. The integrated pipeline emphasizes perceptual fidelity and contextual consistency, achieving competitive metrics ( LPIPS and MOS) while addressing practical misalignment seen in real-world data. Although PSNR/SIM metrics may drop on misaligned GT, the approach improves shadow removal quality and scene structure preservation, illustrating robust performance under challenging conditions. These contributions advance shadow removal in realistically misaligned data and offer practical strategies for perceptual optimization and data-efficient learning.

Abstract

In this paper, we analyze and discuss ShadowFormer in preparation for the NTIRE2023 Shadow Removal Challenge [1], implementing five key improvements: image alignment, the introduction of a perceptual quality loss function, the semi-automatic annotation for shadow detection, joint learning of shadow detection and removal, and the introduction of new data augmentation technique "CutShadow" for shadow removal. Our method achieved scores of 0.196 (3rd out of 19) in LPIPS and 7.44 (4th out of 19) in the Mean Opinion Score (MOS).
Paper Structure (15 sections, 3 equations, 6 figures, 2 tables)

This paper contains 15 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Superimposed display of shadow and no-shadow pair images. Random changes in external parameters caused the misalignment (a), but our method was able to correct the misalignment (b).
  • Figure 2: Overall diagram of our methodology. In preprocessing, we correct image misalignment by Homography Estimation, and also create GT shadow mask by SAMSA. In training, firstly shadow detector predict the shadow mask from the shadow image. Next, ShadowFormer predict shadow-free image from shadow image and the shadow mask which predicted by detector. at this time, CutShadow is used for augumentation, and ESSIM loss and Structure Preservation Loss are used as the error functions.
  • Figure 3: Comparison of (c) the baseline results of training ShadowFormer without alignment and (d) the results of our method including alignment. The PSNR based on the original unaligned GT in (c) and (d) are 26.79 and 22.67, respectively, and while (c) is superior in PSNR, (d) is perceptually superior because it is not blurred and its structure is guaranteed. Shadows are also accurately removed for the regions where they occur.
  • Figure 4: Semi-Automatic Shadow Mask Annotation (SASMA). The red line in the histogram indicates the peak value, and the green and blue lines indicate the lower and upper values used for shadow binarization calculated from the peak values, respectively. Note that the second process, Sub.&Abs., is done in the value channel of HSV.
  • Figure 5: Refinement of shadow detection by joint learning. At first, MTMT is not able to predict shadows at all (b), but by learning the mask annotated by SASMA (c), it is able to predict shadows to some extent (d). Furthermore, by incorporating joint learning, MTMT is able to predict shadows at the level of the GT mask, and is also able to capture the strength and weakness of the shadows (e) and (f).
  • ...and 1 more figures