Table of Contents
Fetching ...

Explore Intrinsic Geometry for Query-based Tiny and Oriented Object Detector with Momentum-based Bipartite Matching

Junpeng Zhang, Zewei Yang, Jie Feng, Yuhui Zheng, Ronghua Shang, Mengxuan Zhang

TL;DR

An Intrinsic Geometry-aware Decoder is designed, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation.

Abstract

Recent query-based detectors have achieved remarkable progress, yet their performance remains constrained when handling objects with arbitrary orientations, especially for tiny objects capturing limited texture information. This limitation primarily stems from the underutilization of intrinsic geometry during pixel-based feature decoding and the occurrence of inter-stage matching inconsistency caused by stage-wise bipartite matching. To tackle these challenges, we present IGOFormer, a novel query-based oriented object detector that explicitly integrates intrinsic geometry into feature decoding and enhances inter-stage matching stability. Specifically, we design an Intrinsic Geometry-aware Decoder, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation. Meanwhile, a Momentum-based Bipartite Matching scheme is developed to adaptively aggregate historical matching costs by formulating an exponential moving average with query-specific smoothing factors, effectively preventing conflicting supervisory signals arising from inter-stage matching inconsistency. Extensive experiments and ablation studies demonstrate the superiority of our IGOFormer for aerial oriented object detection, achieving an AP$_{50}$ score of 78.00\% on DOTA-V1.0 using Swin-T backbone under the single-scale setting. The code will be made publicly available.

Explore Intrinsic Geometry for Query-based Tiny and Oriented Object Detector with Momentum-based Bipartite Matching

TL;DR

An Intrinsic Geometry-aware Decoder is designed, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation.

Abstract

Recent query-based detectors have achieved remarkable progress, yet their performance remains constrained when handling objects with arbitrary orientations, especially for tiny objects capturing limited texture information. This limitation primarily stems from the underutilization of intrinsic geometry during pixel-based feature decoding and the occurrence of inter-stage matching inconsistency caused by stage-wise bipartite matching. To tackle these challenges, we present IGOFormer, a novel query-based oriented object detector that explicitly integrates intrinsic geometry into feature decoding and enhances inter-stage matching stability. Specifically, we design an Intrinsic Geometry-aware Decoder, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation. Meanwhile, a Momentum-based Bipartite Matching scheme is developed to adaptively aggregate historical matching costs by formulating an exponential moving average with query-specific smoothing factors, effectively preventing conflicting supervisory signals arising from inter-stage matching inconsistency. Extensive experiments and ablation studies demonstrate the superiority of our IGOFormer for aerial oriented object detection, achieving an AP score of 78.00\% on DOTA-V1.0 using Swin-T backbone under the single-scale setting. The code will be made publicly available.
Paper Structure (18 sections, 17 equations, 6 figures, 8 tables)

This paper contains 18 sections, 17 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Comparison against conventional pixel-based feature decoding in query-based detectors. (a), given an input image, an object query manages to identify a set of representative object-related features, which are then aggregated for query refinement. In (b), we argue that the relative relations among identified object-related features provide stronger evidence of the object's orientation and spatial arrangement. To capitalize on these relations, we propose an Intrinsic Geometry-aware decoding module, which explicitly explores these relationships to enhance the query's representativeness during iterative query refinement.
  • Figure 2: An example on matching trajectories with stage-wise bipartite matching and our Momentum-based Bipartite Matching. Each color represents a different query, and the change of color for the same instance denotes the inter-stage identity shift across decoding stages.
  • Figure 3: Visualization of the correlation among object-related features of a helicopter. Two anchor features are marked by $\color{yellow}\bigstar$ and $\color{blue}\bigstar$, respectively. Their correlations with other object-related features for the same object query are visualized in the first and second rows, respectively. As the bounding box is rotated to reflect the misalignment with the ground-truth bounding box, the highlighted regions persistently reveal the spatial and structural alignment of the object, while their distribution within the box remains sensitive to orientation changes.
  • Figure 4: Model architecture of the proposed IGOFormer. Our IGOFormer follows the encoder-free architecture in sun2021sparsercnn, and an Intrinsic Geometry Augmentation module and a Momentum-based Bipartite Matching scheme are introduced. The Intrinsic Geometry Augmentation explores the correlation among object-related features, thus providing a critical complementary cue for tiny and oriented object detection. Meanwhile, our Momentum-based Bipartite Matching scheme adaptively aggregates historical matching costs and constructs an exponential moving average mechanism for suppressing drastic label assignment changes.
  • Figure 5: Qualitative comparison against top-ranking oriented object detectors on DOTA-V1.0 TEST set.
  • ...and 1 more figures