Table of Contents
Fetching ...

FastInstShadow: A Simple Query-Based Model for Instance Shadow Detection

Takeru Inoue, Ryusuke Miyamoto

TL;DR

FastInstShadow introduces a query-based approach for instance shadow detection by learning shadow–object relationships during detection through an association transformer with two dual-path decoders. It eliminates the need for a separate pairing step by directly modeling paired shadows and objects, aided by training strategies like shadow direction learning and box-aware mask loss. On the SOBA dataset, FIS variants achieve state-of-the-art performance across instance and association metrics, with D3 delivering the best overall accuracy and D1 providing real-time speeds on moderate-resolution images. The method enables practical shadow-aware editing, generation, and compositing tasks by offering a simpler, faster, and more accurate inference pipeline. Overall, FastInstShadow advances the field by integrating query-based instance detection with mutual shadow–object reasoning in a single, streamlined framework.

Abstract

Instance shadow detection is the task of detecting pairs of shadows and objects, where existing methods first detect shadows and objects independently, then associate them. This paper introduces FastInstShadow, a method that enhances detection accuracy through a query-based architecture featuring an association transformer decoder with two dual-path transformer decoders to assess relationships between shadows and objects during detection. Experimental results using the SOBA dataset showed that the proposed method outperforms all existing methods across all criteria. This method makes real-time processing feasible for moderate-resolution images with better accuracy than SSISv2, the most accurate existing method. Our code is available at https://github.com/wlotkr/FastInstShadow.

FastInstShadow: A Simple Query-Based Model for Instance Shadow Detection

TL;DR

FastInstShadow introduces a query-based approach for instance shadow detection by learning shadow–object relationships during detection through an association transformer with two dual-path decoders. It eliminates the need for a separate pairing step by directly modeling paired shadows and objects, aided by training strategies like shadow direction learning and box-aware mask loss. On the SOBA dataset, FIS variants achieve state-of-the-art performance across instance and association metrics, with D3 delivering the best overall accuracy and D1 providing real-time speeds on moderate-resolution images. The method enables practical shadow-aware editing, generation, and compositing tasks by offering a simpler, faster, and more accurate inference pipeline. Overall, FastInstShadow advances the field by integrating query-based instance detection with mutual shadow–object reasoning in a single, streamlined framework.

Abstract

Instance shadow detection is the task of detecting pairs of shadows and objects, where existing methods first detect shadows and objects independently, then associate them. This paper introduces FastInstShadow, a method that enhances detection accuracy through a query-based architecture featuring an association transformer decoder with two dual-path transformer decoders to assess relationships between shadows and objects during detection. Experimental results using the SOBA dataset showed that the proposed method outperforms all existing methods across all criteria. This method makes real-time processing feasible for moderate-resolution images with better accuracy than SSISv2, the most accurate existing method. Our code is available at https://github.com/wlotkr/FastInstShadow.

Paper Structure

This paper contains 17 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Speed-performance trade-off on SOBA-testing. Our FastInstShadow (FIS) is a scalable framework where the lightweight variant, FIS-D1, outperforms all existing methods in terms of both speed and accuracy on $SOAP_{segm}$bib:lisa. The larger-scale variants, FIS-D2 and FIS-D3, achieve further accuracy improvements.
  • Figure 2: FastInstShadow(FIS) architecture. The left side of the figure illustrates the overall architecture, while the right side details the dual-path transformer decoder bib:fastinst. FIS consists of three components: a backbone network and a pixel decoder, both inherited from FastInst bib:fastinst, along with our novel association transformer decoder. Furthermore, object and shadow pixel features, obtained by flattening the feature map $E_3$, along with queries designed similar to FastInst, are input into the association transformer decoder. The association transformer decoder consists of two dual-path Transformer decoders: the first processes object pixel features to capture object characteristics and the second processes shadow pixel features to capture shadow characteristics. This design enables the model to detect shadows and objects as paired instances while considering their mutual relationships.
  • Figure : (a) Input images
  • Figure : (a) Input images
  • Figure : (b) LISA bib:lisa
  • ...and 3 more figures