Table of Contents
Fetching ...

Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images

Jiawei Zhou, Wuzhou Li, Yi Cao, Hongtao Cai, Xiang Li

TL;DR

The paper tackles few-shot oriented object detection in remote sensing by introducing FOMC, which combines oriented bounding boxes with a Memorable Contrastive Learning (MCL) module and a shot-masking strategy. A two-stage training framework uses a memory-bank enhanced contrastive loss $L_{MCL}$ to learn discriminative, orientation-aware features, while Gaussian masking reduces label confusion during fine-tuning. Empirical results on DOTA and HRSC2016 show substantial gains for novel classes without harming base-class performance, and NWPU VHR-10 results demonstrate competitive conventional FSOD performance with horizontal boxes. The approach advances FSOD in aerial imagery by addressing orientation, data scarcity, and label noise, with practical implications for rapid adaptation in remote sensing applications.

Abstract

Few-shot object detection (FSOD) has garnered significant research attention in the field of remote sensing due to its ability to reduce the dependency on large amounts of annotated data. However, two challenges persist in this area: (1) axis-aligned proposals, which can result in misalignment for arbitrarily oriented objects, and (2) the scarcity of annotated data still limits the performance for unseen object categories. To address these issues, we propose a novel FSOD method for remote sensing images called Few-shot Oriented object detection with Memorable Contrastive learning (FOMC). Specifically, we employ oriented bounding boxes instead of traditional horizontal bounding boxes to learn a better feature representation for arbitrary-oriented aerial objects, leading to enhanced detection performance. To the best of our knowledge, we are the first to address oriented object detection in the few-shot setting for remote sensing images. To address the challenging issue of object misclassification, we introduce a supervised contrastive learning module with a dynamically updated memory bank. This module enables the use of large batches of negative samples and enhances the model's capability to learn discriminative features for unseen classes. We conduct comprehensive experiments on the DOTA and HRSC2016 datasets, and our model achieves state-of-the-art performance on the few-shot oriented object detection task. Code and pretrained models will be released.

Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images

TL;DR

The paper tackles few-shot oriented object detection in remote sensing by introducing FOMC, which combines oriented bounding boxes with a Memorable Contrastive Learning (MCL) module and a shot-masking strategy. A two-stage training framework uses a memory-bank enhanced contrastive loss to learn discriminative, orientation-aware features, while Gaussian masking reduces label confusion during fine-tuning. Empirical results on DOTA and HRSC2016 show substantial gains for novel classes without harming base-class performance, and NWPU VHR-10 results demonstrate competitive conventional FSOD performance with horizontal boxes. The approach advances FSOD in aerial imagery by addressing orientation, data scarcity, and label noise, with practical implications for rapid adaptation in remote sensing applications.

Abstract

Few-shot object detection (FSOD) has garnered significant research attention in the field of remote sensing due to its ability to reduce the dependency on large amounts of annotated data. However, two challenges persist in this area: (1) axis-aligned proposals, which can result in misalignment for arbitrarily oriented objects, and (2) the scarcity of annotated data still limits the performance for unseen object categories. To address these issues, we propose a novel FSOD method for remote sensing images called Few-shot Oriented object detection with Memorable Contrastive learning (FOMC). Specifically, we employ oriented bounding boxes instead of traditional horizontal bounding boxes to learn a better feature representation for arbitrary-oriented aerial objects, leading to enhanced detection performance. To the best of our knowledge, we are the first to address oriented object detection in the few-shot setting for remote sensing images. To address the challenging issue of object misclassification, we introduce a supervised contrastive learning module with a dynamically updated memory bank. This module enables the use of large batches of negative samples and enhances the model's capability to learn discriminative features for unseen classes. We conduct comprehensive experiments on the DOTA and HRSC2016 datasets, and our model achieves state-of-the-art performance on the few-shot oriented object detection task. Code and pretrained models will be released.
Paper Structure (22 sections, 7 equations, 10 figures, 8 tables)

This paper contains 22 sections, 7 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Comparison between the few-shot object detection (FSOD) task predicts HBBs (top) and the newly proposed few-shot oriented object detection (FSOOD) task that outputs OBBs (bottom). HBBs often cover background areas and adjacent objects, particularly for dense or large-scale objects. Conversely, the OBBs provide a more accurate representation of objects with tighter bounding boxes.
  • Figure 2: The overall architecture of our proposed FOMC model. (a) In the base training phase, all network parameters of the model are trained using abundant data from the base categories. (b) In the fine-tuning stage, the parameters of the ResNet backbone network are frozen, while the other modules are trained at a lower learning rate using a few samples from novel categories. An MCL module is designed to store encoded proposal features for contrastive learning and encourage the model to learn class distinctive features.
  • Figure 3: Memorable Contrastive Learning encoding (MCL) module. The memory bank stores for each proposal the feature embeddings and the IoU score between the proposal and the matched ground truth box.
  • Figure 4: Illustration of the shot masking strategy employed in our work. Previous FSOD methods treat unselected objects as background and therefore cause confusion for model training. Our shot masking strategy masks out all unselected objects using a Gaussian blurring operation.
  • Figure 5: Detection results of DOTA. In the 20-shot setting, our proposed model FOMC achieves satisfactory performance in detecting oriented objects in complex backgrounds, crowded and arbitrary-oriented instances for both novel and base classes.
  • ...and 5 more figures