Table of Contents
Fetching ...

Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision

Zitang Sun, Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

TL;DR

DetGain tackles data efficiency for object detection by shifting sample selection from loss-based signals to a metric-aligned, dataset-level utility: the image-level marginal contribution to mAP, $\delta_{\mathrm{mAP}}(x; f, \mathcal{D})$. It introduces a teacher–student gain gap, $s_{\mathrm{DG}}(x)$, and computes it efficiently via closed-form estimators under a uniform prior for TP/FP score distributions, enabling real-time online sampling. The method is architecture-agnostic and easily pluggable into existing detectors, improving average mAP by about $+2.0$ across six detectors on COCO 2017, with larger gains under noisy data and when paired with online augmentation or knowledge distillation. DetGain complements, rather than replaces, model design choices, offering a practical pipeline to enhance data efficiency in object detection with minimal changes to training code and objectives.

Abstract

High-quality data has become a primary driver of progress under scale laws, with curated datasets often outperforming much larger unfiltered ones at lower cost. Online data curation extends this idea by dynamically selecting training samples based on the model's evolving state. While effective in classification and multimodal learning, existing online sampling strategies rarely extend to object detection because of its structural complexity and domain gaps. We introduce DetGain, an online data curation method specifically for object detection that estimates the marginal perturbation of each image to dataset-level Average Precision (AP) based on its prediction quality. By modeling global score distributions, DetGain efficiently estimates the global AP change and computes teacher-student contribution gaps to select informative samples at each iteration. The method is architecture-agnostic and minimally intrusive, enabling straightforward integration into diverse object detection architectures. Experiments on the COCO dataset with multiple representative detectors show consistent improvements in accuracy. DetGain also demonstrates strong robustness under low-quality data and can be effectively combined with knowledge distillation techniques to further enhance performance, highlighting its potential as a general and complementary strategy for data-efficient object detection.

Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision

TL;DR

DetGain tackles data efficiency for object detection by shifting sample selection from loss-based signals to a metric-aligned, dataset-level utility: the image-level marginal contribution to mAP, . It introduces a teacher–student gain gap, , and computes it efficiently via closed-form estimators under a uniform prior for TP/FP score distributions, enabling real-time online sampling. The method is architecture-agnostic and easily pluggable into existing detectors, improving average mAP by about across six detectors on COCO 2017, with larger gains under noisy data and when paired with online augmentation or knowledge distillation. DetGain complements, rather than replaces, model design choices, offering a practical pipeline to enhance data efficiency in object detection with minimal changes to training code and objectives.

Abstract

High-quality data has become a primary driver of progress under scale laws, with curated datasets often outperforming much larger unfiltered ones at lower cost. Online data curation extends this idea by dynamically selecting training samples based on the model's evolving state. While effective in classification and multimodal learning, existing online sampling strategies rarely extend to object detection because of its structural complexity and domain gaps. We introduce DetGain, an online data curation method specifically for object detection that estimates the marginal perturbation of each image to dataset-level Average Precision (AP) based on its prediction quality. By modeling global score distributions, DetGain efficiently estimates the global AP change and computes teacher-student contribution gaps to select informative samples at each iteration. The method is architecture-agnostic and minimally intrusive, enabling straightforward integration into diverse object detection architectures. Experiments on the COCO dataset with multiple representative detectors show consistent improvements in accuracy. DetGain also demonstrates strong robustness under low-quality data and can be effectively combined with knowledge distillation techniques to further enhance performance, highlighting its potential as a general and complementary strategy for data-efficient object detection.

Paper Structure

This paper contains 24 sections, 33 equations, 7 figures, 11 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of our online data curation for object detection, which selects the most informative samples at each training iteration to boost model training performance.
  • Figure 2: Overview of the proposed online data curation framework for object detection. (A) Overall pipeline: a pretrained teacher and a student model compute their respective marginal mAP contributions ($\delta_{\mathrm{AP}}$) for each image. The difference between them defines the learnability score, which guides mini-batch sampling during training. (B) Estimation of DetGain: inserting a single TP or FP perturbs the precision–recall curve. We model global TP/FP score distributions and analytically estimate each detection’s contribution to mAP under these densities. (C) Demonstration of DetGain behavior. Left: $\Delta\text{mAP}$ when inserting a class-specific TP detection—determined jointly by the confidence score and bounding-box quality (IoU). Right: $\Delta\text{mAP}$ when inserting a class-specific FP detection—determined primarily by the prediction score. The bottom row uses the true TP/FP score distributions measured from Faster R-CNN predictions; recomputing these at every iteration is computationally expensive. We adopt the top row (uniform distribution) for a model-agnostic faster computation.
  • Figure 3: Joint DetGain–augmentation sampling framework. Strong augmentation expands the augmented data space beyond the original training set. The subsampling space extends from the small overlap between the training set and informative data (green–red) to a larger overlap (blue–red), reducing overfitting and improving data diversity for the student model.
  • Figure 4: Robustness under noisy annotations. Results with Faster R-CNN–Res50 (stu.) and Faster R-CNN–Res152 (tea.) on COCO 2017 under controlled annotation-noise ratios. DetGain remains more stable than loss-based learnability baselines rholossjestevans2024bad and the uniform baseline across varying noise levels.
  • Figure 5: Monte Carlo verification of the analytic $\Delta$AP formulation. Each subplot compares simulated and analytic $\Delta$AP for TP and FP insertions under different Beta priors (top: Faster R-CNN, bottom: FCOS). The excellent agreement demonstrates the correctness and numerical stability of our closed-form derivation.
  • ...and 2 more figures