From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets
Sarina Penquitt, Jonathan Klees, Rinor Cakaj, Daniel Kondermann, Matthias Rottmann, Lars Schmarje
TL;DR
This work tackles label quality in object detection by introducing REC✓D, a semi-automated framework that merges automated error proposals with crowd-sourced microtasks to detect and correct bounding-box label errors. The authors demonstrate the approach on the KITTI pedestrian class, revealing at least 18% previously missing or inaccurate annotations and showing that current error-detection methods can correct hundreds of errors with less effort than full reannotation, while also noting that up to 66% of errors may still go undetected. A validated ground-truth (VGT) for KITTI pedestrians is constructed via a two-stage microtask pipeline producing soft labels, enabling a real-world benchmark for error-detection and correction methods. The work highlights the benefits and limitations of detector-based proposals, emphasizes the importance of high-quality soft labels, and provides a publicly available framework and dataset to spur further research in label correction for object detection.
Abstract
Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors often compromise the quality of these datasets and affect the outcomes of training and benchmark evaluations. Although label error detection methods for object detection datasets now exist, they are typically validated only on synthetic benchmarks or via limited manual inspection. How to correct such errors systematically and at scale remains an open problem. We introduce a semi-automated framework for label error correction called Rechecked. Building on existing label error detection methods, their error proposals are reviewed with lightweight, crowd-sourced microtasks. We apply Rechecked to the class pedestrian in the KITTI dataset, for which we crowdsourced high-quality corrected annotations. We detect 18% of missing and inaccurate labels in the original ground truth. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors with little human effort compared to annotation from scratch. However, even the best methods still miss up to 66% of the label errors, which motivates further research, now enabled by our released benchmark.
