Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision

Zitang Sun; Masakazu Yoshimura; Junji Otsuka; Atsushi Irie; Takeshi Ohashi

Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision

Zitang Sun, Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi

TL;DR

DetGain tackles data efficiency for object detection by shifting sample selection from loss-based signals to a metric-aligned, dataset-level utility: the image-level marginal contribution to mAP, $\delta_{\mathrm{mAP}}(x; f, \mathcal{D})$. It introduces a teacher–student gain gap, $s_{\mathrm{DG}}(x)$, and computes it efficiently via closed-form estimators under a uniform prior for TP/FP score distributions, enabling real-time online sampling. The method is architecture-agnostic and easily pluggable into existing detectors, improving average mAP by about $+2.0$ across six detectors on COCO 2017, with larger gains under noisy data and when paired with online augmentation or knowledge distillation. DetGain complements, rather than replaces, model design choices, offering a practical pipeline to enhance data efficiency in object detection with minimal changes to training code and objectives.

Abstract

High-quality data has become a primary driver of progress under scale laws, with curated datasets often outperforming much larger unfiltered ones at lower cost. Online data curation extends this idea by dynamically selecting training samples based on the model's evolving state. While effective in classification and multimodal learning, existing online sampling strategies rarely extend to object detection because of its structural complexity and domain gaps. We introduce DetGain, an online data curation method specifically for object detection that estimates the marginal perturbation of each image to dataset-level Average Precision (AP) based on its prediction quality. By modeling global score distributions, DetGain efficiently estimates the global AP change and computes teacher-student contribution gaps to select informative samples at each iteration. The method is architecture-agnostic and minimally intrusive, enabling straightforward integration into diverse object detection architectures. Experiments on the COCO dataset with multiple representative detectors show consistent improvements in accuracy. DetGain also demonstrates strong robustness under low-quality data and can be effectively combined with knowledge distillation techniques to further enhance performance, highlighting its potential as a general and complementary strategy for data-efficient object detection.

Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision

TL;DR

Abstract

Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)