Table of Contents
Fetching ...

Pixel-level Quality Assessment for Oriented Object Detection

Yunhui Zhu, Buliao Huang

TL;DR

Pixel-level Quality Assessment (PQA) replaces box-level IoU prediction with pixel-level spatial consistency between each pixel's relation to the predicted box and the GT box, avoiding structural coupling biases. It introduces a global position heatmap and a per-box integration of pixel alignments to produce a per-box quality score, which is combined with classification to rank detections. Experiments on DOTA-v1.0 and HRSC2016 show consistent improvements across detectors such as Rotated RetinaNet and STD, with a lightweight variant (PQA-Lite) offering faster inference and comparable gains. The approach is modular and broadly applicable to oriented object detectors, delivering improved localization quality estimation and ranking in practice.

Abstract

Modern oriented object detectors typically predict a set of bounding boxes and select the top-ranked ones based on estimated localization quality. Achieving high detection performance requires that the estimated quality closely aligns with the actual localization accuracy. To this end, existing approaches predict the Intersection over Union (IoU) between the predicted and ground-truth (GT) boxes as a proxy for localization quality. However, box-level IoU prediction suffers from a structural coupling issue: since the predicted box is derived from the detector's internal estimation of the GT box, the predicted IoU--based on their similarity--can be overestimated for poorly localized boxes. To overcome this limitation, we propose a novel Pixel-level Quality Assessment (PQA) framework, which replaces box-level IoU prediction with the integration of pixel-level spatial consistency. PQA measures the alignment between each pixel's relative position to the predicted box and its corresponding position to the GT box. By operating at the pixel level, PQA avoids directly comparing the predicted box with the estimated GT box, thereby eliminating the inherent similarity bias in box-level IoU prediction. Furthermore, we introduce a new integration metric that aggregates pixel-level spatial consistency into a unified quality score, yielding a more accurate approximation of the actual localization quality. Extensive experiments on HRSC2016 and DOTA demonstrate that PQA can be seamlessly integrated into various oriented object detectors, consistently improving performance (e.g., +5.96% AP$_{50:95}$ on Rotated RetinaNet and +2.32% on STD).

Pixel-level Quality Assessment for Oriented Object Detection

TL;DR

Pixel-level Quality Assessment (PQA) replaces box-level IoU prediction with pixel-level spatial consistency between each pixel's relation to the predicted box and the GT box, avoiding structural coupling biases. It introduces a global position heatmap and a per-box integration of pixel alignments to produce a per-box quality score, which is combined with classification to rank detections. Experiments on DOTA-v1.0 and HRSC2016 show consistent improvements across detectors such as Rotated RetinaNet and STD, with a lightweight variant (PQA-Lite) offering faster inference and comparable gains. The approach is modular and broadly applicable to oriented object detectors, delivering improved localization quality estimation and ranking in practice.

Abstract

Modern oriented object detectors typically predict a set of bounding boxes and select the top-ranked ones based on estimated localization quality. Achieving high detection performance requires that the estimated quality closely aligns with the actual localization accuracy. To this end, existing approaches predict the Intersection over Union (IoU) between the predicted and ground-truth (GT) boxes as a proxy for localization quality. However, box-level IoU prediction suffers from a structural coupling issue: since the predicted box is derived from the detector's internal estimation of the GT box, the predicted IoU--based on their similarity--can be overestimated for poorly localized boxes. To overcome this limitation, we propose a novel Pixel-level Quality Assessment (PQA) framework, which replaces box-level IoU prediction with the integration of pixel-level spatial consistency. PQA measures the alignment between each pixel's relative position to the predicted box and its corresponding position to the GT box. By operating at the pixel level, PQA avoids directly comparing the predicted box with the estimated GT box, thereby eliminating the inherent similarity bias in box-level IoU prediction. Furthermore, we introduce a new integration metric that aggregates pixel-level spatial consistency into a unified quality score, yielding a more accurate approximation of the actual localization quality. Extensive experiments on HRSC2016 and DOTA demonstrate that PQA can be seamlessly integrated into various oriented object detectors, consistently improving performance (e.g., +5.96% AP on Rotated RetinaNet and +2.32% on STD).

Paper Structure

This paper contains 21 sections, 12 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Correlation between the estimated and actual localization quality of predicted oriented boxes.
  • Figure 2: Illustrative examples of how pixel-level spatial consistency correlates with GT IoU. (a) Predicted oriented boxes. (b) Heatmap encoding pixel-wise relative positions to the GT box, where higher values indicate closer proximity to the box center. (c) Heatmap encoding pixel-wise relative positions to the predicted box, represented analogously.
  • Figure 3: Overall framework of PQA. $H_i$ and $F_i$ provide spatial encodings of pixels' relative positions to the nearest GT box and to the predicted box $b_i$, respectively.
  • Figure 4: Comparison of quality scores computed using different integration metrics for pixel-level spatial consistency, as the predicted oriented boxes vary in (a) orientation angle, (b) center point offset, and (c) aspect ratio.
  • Figure 5: Visualization of top-ranked predicted oriented boxes with their estimated quality scores (qs) and actual localization quality (GT IoU) before (row 1) and after (row 2) integrating PQA into STD. Predicted boxes are ranked based on qs. Yellow denotes GT boxes; cyan, red, and purple indicate the rank-1, rank-2, and rank-3 predicted boxes, respectively.
  • ...and 4 more figures