Table of Contents
Fetching ...

From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets

Sarina Penquitt, Jonathan Klees, Rinor Cakaj, Daniel Kondermann, Matthias Rottmann, Lars Schmarje

TL;DR

This work tackles label quality in object detection by introducing REC✓D, a semi-automated framework that merges automated error proposals with crowd-sourced microtasks to detect and correct bounding-box label errors. The authors demonstrate the approach on the KITTI pedestrian class, revealing at least 18% previously missing or inaccurate annotations and showing that current error-detection methods can correct hundreds of errors with less effort than full reannotation, while also noting that up to 66% of errors may still go undetected. A validated ground-truth (VGT) for KITTI pedestrians is constructed via a two-stage microtask pipeline producing soft labels, enabling a real-world benchmark for error-detection and correction methods. The work highlights the benefits and limitations of detector-based proposals, emphasizes the importance of high-quality soft labels, and provides a publicly available framework and dataset to spur further research in label correction for object detection.

Abstract

Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors often compromise the quality of these datasets and affect the outcomes of training and benchmark evaluations. Although label error detection methods for object detection datasets now exist, they are typically validated only on synthetic benchmarks or via limited manual inspection. How to correct such errors systematically and at scale remains an open problem. We introduce a semi-automated framework for label error correction called Rechecked. Building on existing label error detection methods, their error proposals are reviewed with lightweight, crowd-sourced microtasks. We apply Rechecked to the class pedestrian in the KITTI dataset, for which we crowdsourced high-quality corrected annotations. We detect 18% of missing and inaccurate labels in the original ground truth. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors with little human effort compared to annotation from scratch. However, even the best methods still miss up to 66% of the label errors, which motivates further research, now enabled by our released benchmark.

From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets

TL;DR

This work tackles label quality in object detection by introducing REC✓D, a semi-automated framework that merges automated error proposals with crowd-sourced microtasks to detect and correct bounding-box label errors. The authors demonstrate the approach on the KITTI pedestrian class, revealing at least 18% previously missing or inaccurate annotations and showing that current error-detection methods can correct hundreds of errors with less effort than full reannotation, while also noting that up to 66% of errors may still go undetected. A validated ground-truth (VGT) for KITTI pedestrians is constructed via a two-stage microtask pipeline producing soft labels, enabling a real-world benchmark for error-detection and correction methods. The work highlights the benefits and limitations of detector-based proposals, emphasizes the importance of high-quality soft labels, and provides a publicly available framework and dataset to spur further research in label correction for object detection.

Abstract

Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors often compromise the quality of these datasets and affect the outcomes of training and benchmark evaluations. Although label error detection methods for object detection datasets now exist, they are typically validated only on synthetic benchmarks or via limited manual inspection. How to correct such errors systematically and at scale remains an open problem. We introduce a semi-automated framework for label error correction called Rechecked. Building on existing label error detection methods, their error proposals are reviewed with lightweight, crowd-sourced microtasks. We apply Rechecked to the class pedestrian in the KITTI dataset, for which we crowdsourced high-quality corrected annotations. We detect 18% of missing and inaccurate labels in the original ground truth. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors with little human effort compared to annotation from scratch. However, even the best methods still miss up to 66% of the label errors, which motivates further research, now enabled by our released benchmark.

Paper Structure

This paper contains 29 sections, 103 figures, 4 tables.

Figures (103)

  • Figure 1: Illustration of our framework REC$\checkmark$D. (a) Original ground truth (GT) annotations in object detection datasets often contain errors such as missing bounding boxes (false negatives, FN) or incorrect extra boxes (false positives, FP). (b) Pretrained object detectors yield bounding box predictions. (c) These boxes are scored by a label error detection method estimating the probability for a label error. (d) Human microtasks are used to validate each box by multiple annotators, resulting in a soft label that can be used to correct label errors.
  • Figure 2: Overview of our suggested workflow REC$\checkmark$D to detect and correct label errors in object detection datasets.
  • Figure 3: Microtask interface for verifying whether the object in the bounding box is a real human being. The interface shows only the relevant region and minimal surrounding context. A clear highlight guides the annotator’s attention, and no unnecessary elements are shown that could distract from the decision.
  • Figure 4: Annotator interface for microtask 6: Activity classification.
  • Figure 5: Comparison of VGT soft label probability and human perception of being a pedestrian. Each dot represents an expert annotator annotation and its corresponding soft label probability. The diamond represents the mean, the dashed lines the standard deviation and the star the median.
  • ...and 98 more figures