Table of Contents
Fetching ...

Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Chenxu Wang, Chunyan Xu, Ziqi Gu, Zhen Cui

TL;DR

This work tackles the challenges of semi-supervised oriented object detection (SOOD) in aerial imagery by identifying three key inconsistencies: sampling, assignment, and confidence. It proposes Multi-clue Consistency Learning (MCL), comprising Gaussian Center Assignment (GCA) for labeled data, Scale-aware Label Assignment (SLA) for pixel-level pseudo-labels on unlabeled data, and Consistent Confidence Soft Label (CCSL) to align classification with localization. The approach extends a Mean Teacher framework to rotated bounding boxes and demonstrates state-of-the-art performance on DOTA-v1.5 and DOTA-v1.0 benchmarks, reinforcing its effectiveness for objects with large aspect ratios and varying scales. The methods offer practical improvements for remote-sensing detection and provide a framework that could generalize to other rotated-object tasks in semi-supervised learning.

Abstract

While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when selecting positive labels from labeled data. 2) Assignment inconsistency: balancing the precision and localization quality of oriented pseudo-boxes poses greater challenges which introduces more noise when selecting positive labels from unlabeled data. 3) Confidence inconsistency: there exists more mismatch between the predicted classification and localization qualities when considering oriented objects, affecting the selection of pseudo-labels. Therefore, we propose a Multi-clue Consistency Learning (MCL) framework to bridge gaps between general and oriented objects in semi-supervised detection. Specifically, considering various shapes of rotated objects, the Gaussian Center Assignment is specially designed to select the pixel-level positive labels from labeled data. We then introduce the Scale-aware Label Assignment to select pixel-level pseudo-labels instead of unreliable pseudo-boxes, which is a divide-and-rule strategy suited for objects with various scales. The Consistent Confidence Soft Label is adopted to further boost the detector by maintaining the alignment of the predicted results. Comprehensive experiments on DOTA-v1.5 and DOTA-v1.0 benchmarks demonstrate that our proposed MCL can achieve state-of-the-art performance in the semi-supervised oriented object detection task.

Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

TL;DR

This work tackles the challenges of semi-supervised oriented object detection (SOOD) in aerial imagery by identifying three key inconsistencies: sampling, assignment, and confidence. It proposes Multi-clue Consistency Learning (MCL), comprising Gaussian Center Assignment (GCA) for labeled data, Scale-aware Label Assignment (SLA) for pixel-level pseudo-labels on unlabeled data, and Consistent Confidence Soft Label (CCSL) to align classification with localization. The approach extends a Mean Teacher framework to rotated bounding boxes and demonstrates state-of-the-art performance on DOTA-v1.5 and DOTA-v1.0 benchmarks, reinforcing its effectiveness for objects with large aspect ratios and varying scales. The methods offer practical improvements for remote-sensing detection and provide a framework that could generalize to other rotated-object tasks in semi-supervised learning.

Abstract

While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when selecting positive labels from labeled data. 2) Assignment inconsistency: balancing the precision and localization quality of oriented pseudo-boxes poses greater challenges which introduces more noise when selecting positive labels from unlabeled data. 3) Confidence inconsistency: there exists more mismatch between the predicted classification and localization qualities when considering oriented objects, affecting the selection of pseudo-labels. Therefore, we propose a Multi-clue Consistency Learning (MCL) framework to bridge gaps between general and oriented objects in semi-supervised detection. Specifically, considering various shapes of rotated objects, the Gaussian Center Assignment is specially designed to select the pixel-level positive labels from labeled data. We then introduce the Scale-aware Label Assignment to select pixel-level pseudo-labels instead of unreliable pseudo-boxes, which is a divide-and-rule strategy suited for objects with various scales. The Consistent Confidence Soft Label is adopted to further boost the detector by maintaining the alignment of the predicted results. Comprehensive experiments on DOTA-v1.5 and DOTA-v1.0 benchmarks demonstrate that our proposed MCL can achieve state-of-the-art performance in the semi-supervised oriented object detection task.
Paper Structure (14 sections, 6 equations, 6 figures, 8 tables)

This paper contains 14 sections, 6 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The pipeline of our MCL. To address the sampling inconsistency issue, the Gaussian Center Assignment is introduced to select more accurate pixel-level positive labels from labeled data. The Scale-aware Label Assignment is proposed to select pixel-level pseudo-labels for objects with various scales. The Consistent Confidence Soft Label is adopted to mitigate the mismatch problem between classification and localization qualities through maintaining the alignment of the predicted results.
  • Figure 2: Analysis of the sampling inconsistency on the general COCO and aerial DOTA-v1.5 datasets. (a) Statistics of object aspect ratio distribution on both datasets. (b) The class activation mapping of an aerial object with large aspect ratio. (c) The positive label selection of an aerial object by the common center sampling strategy, where the red points and blue points represent negatives and positives, respectively.
  • Figure 3: Investigation on the assignment and confidence inconsistency problems. (a) The pseudo-box precision of the DOTA-v1.5 dataset and COCO dataset under different IoU thresholds. (b) The top heat-map illustrates the consistency of centerness and IoU between ground truth boxes and their corresponding true positive boxes on the DOTA-v1.5 dataset, while the bottom one is on the COCO dataset.
  • Figure 4: Analysis on difference feature maps and the object centerness. (a) Score imbalance between feature maps at different levels. (b) For centerness-based soft label, condition 1 introduce ambiguities while condition 2 is more suitable.
  • Figure 5: Some visualization results from the DOTA-v1.5 dataset. The first and the last rows are the results of Dense Teacher and MCL respectively. True Positive, False Negative, and False Positive predictions are marked in green, red, and blue respectively.
  • ...and 1 more figures