Table of Contents
Fetching ...

InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images

Wuzhou Li, Jiawei Zhou, Xiang Li, Yi Cao, Guang Jin, Xuemin Zhang

TL;DR

InfRS tackles incremental few-shot object detection in remote sensing by learning novel classes from scarce data while preserving base-class performance without reusing old data. It introduces a Hybrid Prototypical Contrastive (HPC) encoding module to leverage base prototypes alongside novel instances for discriminative RoI representations, and a prototypical calibration strategy based on the Wasserstein distance to mitigate catastrophic forgetting during fine-tuning. Prototypes generated from base classes guide learning, and the Wasserstein-based regularization aligns the updated model with base knowledge. Extensive experiments on NWPU VHR-10 and DIOR demonstrate robust gains for novel categories across shot settings, with concurrent improvements on base-class performance, indicating practical viability for RS iFSOD.

Abstract

Recently, the field of few-shot detection within remote sensing imagery has witnessed significant advancements. Despite these progresses, the capacity for continuous conceptual learning still poses a significant challenge to existing methodologies. In this paper, we explore the intricate task of incremental few-shot object detection in remote sensing images. We introduce a pioneering fine-tuningbased technique, termed InfRS, designed to facilitate the incremental learning of novel classes using a restricted set of examples, while concurrently preserving the performance on established base classes without the need to revisit previous datasets. Specifically, we pretrain the model using abundant data from base classes and then generate a set of class-wise prototypes that represent the intrinsic characteristics of the data. In the incremental learning stage, we introduce a Hybrid Prototypical Contrastive (HPC) encoding module for learning discriminative representations. Furthermore, we develop a prototypical calibration strategy based on the Wasserstein distance to mitigate the catastrophic forgetting problem. Comprehensive evaluations on the NWPU VHR-10 and DIOR datasets demonstrate that our model can effectively solve the iFSOD problem in remote sensing images. Code will be released.

InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images

TL;DR

InfRS tackles incremental few-shot object detection in remote sensing by learning novel classes from scarce data while preserving base-class performance without reusing old data. It introduces a Hybrid Prototypical Contrastive (HPC) encoding module to leverage base prototypes alongside novel instances for discriminative RoI representations, and a prototypical calibration strategy based on the Wasserstein distance to mitigate catastrophic forgetting during fine-tuning. Prototypes generated from base classes guide learning, and the Wasserstein-based regularization aligns the updated model with base knowledge. Extensive experiments on NWPU VHR-10 and DIOR demonstrate robust gains for novel categories across shot settings, with concurrent improvements on base-class performance, indicating practical viability for RS iFSOD.

Abstract

Recently, the field of few-shot detection within remote sensing imagery has witnessed significant advancements. Despite these progresses, the capacity for continuous conceptual learning still poses a significant challenge to existing methodologies. In this paper, we explore the intricate task of incremental few-shot object detection in remote sensing images. We introduce a pioneering fine-tuningbased technique, termed InfRS, designed to facilitate the incremental learning of novel classes using a restricted set of examples, while concurrently preserving the performance on established base classes without the need to revisit previous datasets. Specifically, we pretrain the model using abundant data from base classes and then generate a set of class-wise prototypes that represent the intrinsic characteristics of the data. In the incremental learning stage, we introduce a Hybrid Prototypical Contrastive (HPC) encoding module for learning discriminative representations. Furthermore, we develop a prototypical calibration strategy based on the Wasserstein distance to mitigate the catastrophic forgetting problem. Comprehensive evaluations on the NWPU VHR-10 and DIOR datasets demonstrate that our model can effectively solve the iFSOD problem in remote sensing images. Code will be released.
Paper Structure (21 sections, 9 equations, 5 figures, 5 tables)

This paper contains 21 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparisons between object proposal embeddings learned through different paradigms: (a) Object proposal embeddings learned by a general softmax classifier, which are not discriminative enough. (b) Standard supervised contrastive learning, which depends on instance-wise contrast to learn a discriminative feature distribution. (c) We propose an HPC encoding module that combines the advantages of both base class prototypes and novel instances. This provides explicit supervision to facilitate iFSOD.
  • Figure 2: The overall framework of our InfRS. (a) Initially, the model is pretrained on base data, from which we extract prototypes as representatives for each base class. (b) During the fine-tuning stage, the few-shot novel data is incrementally enrolled. The HPC encoding module leverages prototypical knowledge to learn proposal embeddings with inter-class distinction and intra-class compactness. Furthermore, the prototypical calibration strategy, based on the Wasserstein distance, is introduced to ease the catastrophic forgetting effect.
  • Figure 3: Visualization of the 10-shot detection results using our InfRS on the NWPU VHR-10 dataset for split 1.
  • Figure 4: Qualitative comparison of the baseline model TFA and our InfRS.
  • Figure 5: Visualization of the proposal embeddings learned by the baseline model TFA and our InfRS using t-SNE: (a) Overlap of the novel class clusters leads to sub-optimal decision boundaries and overlaps with the base classes, notably between the basketball court and tennis court categories. (b) In contrast, the decision boundaries between classes are well-defined, showcasing clear separation among both base and novel classes.