S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection
Yu Lin, Jianghang Lin, Kai Ye, You Shen, Yan Zhang, Shengchuan Zhang, Liujuan Cao, Rongrong Ji
TL;DR
This work tackles sparsely annotated oriented object detection (SAOOD) in dense remote sensing scenes, where labeling all instances is costly and unlabeled objects can mislead learning. It introduces S$^2$Teacher, a step-by-step teacher framework that progressively mines pseudo-labels via cluster-based pseudo-label generation, filters them with information entropy Gaussian modeling, and freezes high-confidence labels over time, complemented by a Focal Ignore Loss to down-weight misleading negatives. The method combines these components into a unified loss that leverages real GT, frozen pseudo GT, and progressively mined pseudo GT to improve detector performance under sparse supervision. Experiments on DOTA-v1.0 and DOTA-v1.5 show substantial gains across annotation budgets, with near fully-supervised performance achieved at only 10% of instances annotated, demonstrating strong annotation efficiency gains for dense remote sensing scenes.
Abstract
Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the S$^2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that S$^2$Teacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.
