Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Seungju Yoo, Hyuk Kwon, Joong-Won Hwang, Kibok Lee
TL;DR
The paper addresses the challenge of estimating object detector performance without ground-truth labels, especially under deployment-time distribution shifts. It introduces Prediction Consistency and Reliability (PCR), a two-score framework that leverages pre-NMS and post-NMS bounding boxes to infer localization and classification quality, and maps these signals to mAP via regression over a corruption-based meta-dataset. By adopting ImageNet-C style corruptions across varying severities, the authors provide a realistic and scalable benchmark for AutoEval in object detection. Empirical results show that PCR consistently outperforms existing AutoEval baselines for both vehicle and pedestrian detection, with robustness across corruption severities and gains when combined with Box Stability (BoS). This work enables label-free model selection and monitoring in real-world settings and establishes a practical AutoEval protocol for object detection.
Abstract
Recent advances in computer vision have made training object detectors more efficient and effective; however, assessing their performance in real-world applications still relies on costly manual annotation. To address this limitation, we develop an automated model evaluation (AutoEval) framework for object detection. We propose Prediction Consistency and Reliability (PCR), which leverages the multiple candidate bounding boxes that conventional detectors generate before non-maximum suppression (NMS). PCR estimates detection performance without ground-truth labels by jointly measuring 1) the spatial consistency between boxes before and after NMS, and 2) the reliability of the retained boxes via the confidence scores of overlapping boxes. For a more realistic and scalable evaluation, we construct a meta-dataset by applying image corruptions of varying severity. Experimental results demonstrate that PCR yields more accurate performance estimates than existing AutoEval methods, and the proposed meta-dataset covers a wider range of detection performance. The code is available at https://github.com/YonseiML/autoeval-det.
