Table of Contents
Fetching ...

S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection

Yu Lin, Jianghang Lin, Kai Ye, You Shen, Yan Zhang, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

TL;DR

This work tackles sparsely annotated oriented object detection (SAOOD) in dense remote sensing scenes, where labeling all instances is costly and unlabeled objects can mislead learning. It introduces S$^2$Teacher, a step-by-step teacher framework that progressively mines pseudo-labels via cluster-based pseudo-label generation, filters them with information entropy Gaussian modeling, and freezes high-confidence labels over time, complemented by a Focal Ignore Loss to down-weight misleading negatives. The method combines these components into a unified loss that leverages real GT, frozen pseudo GT, and progressively mined pseudo GT to improve detector performance under sparse supervision. Experiments on DOTA-v1.0 and DOTA-v1.5 show substantial gains across annotation budgets, with near fully-supervised performance achieved at only 10% of instances annotated, demonstrating strong annotation efficiency gains for dense remote sensing scenes.

Abstract

Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the S$^2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that S$^2$Teacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.

S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection

TL;DR

This work tackles sparsely annotated oriented object detection (SAOOD) in dense remote sensing scenes, where labeling all instances is costly and unlabeled objects can mislead learning. It introduces STeacher, a step-by-step teacher framework that progressively mines pseudo-labels via cluster-based pseudo-label generation, filters them with information entropy Gaussian modeling, and freezes high-confidence labels over time, complemented by a Focal Ignore Loss to down-weight misleading negatives. The method combines these components into a unified loss that leverages real GT, frozen pseudo GT, and progressively mined pseudo GT to improve detector performance under sparse supervision. Experiments on DOTA-v1.0 and DOTA-v1.5 show substantial gains across annotation budgets, with near fully-supervised performance achieved at only 10% of instances annotated, demonstrating strong annotation efficiency gains for dense remote sensing scenes.

Abstract

Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the STeacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that STeacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.

Paper Structure

This paper contains 20 sections, 10 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Compare different annotation methods. RBox (full supervision), HBox, and point supervision require labeling all objects and careful checking to avoid missed annotations, which is time-consuming. In remote sensing, dense small objects and issues like blurring and occlusion make labeling all objects difficult. Sparse annotation randomly labels partial objects without check, greatly reducing cost. Our S$^2$Teacher approaches full supervision performance under this setting.
  • Figure 2: The overall framework of the S$^2$Teacher. The input image is processed through teacher model and CBP to prioritize the mining of easy unlabeled objects. After filtering out false positives through the EGPF, the pseudo GT is used for training the student model. The pseudo GT mined by each iteration are compared through the PLF, gradually freezing high confidence pseudo GT, prompting the CBP to continuously mine harder unlabeled objects.
  • Figure 3: Numerous false negatives mislead training.
  • Figure 4: Prior methods generate FP pseudo-labels.
  • Figure 5: S$^2$Teacher pseudo label mining visualization. Among them, the green box is the manually annotated real GT, the red box is the pseudo GT, the orange box is the frozen pseudo GT by PLF, and the blue box is the mined pseudo GT, but it was missed during manual annotation, so it is mistakenly judged as FP.
  • ...and 2 more figures