Table of Contents
Fetching ...

Bayesian Detector Combination for Object Detection with Crowdsourced Annotations

Zhi Qin Tan, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, Yunpeng Li

TL;DR

This work tackles learning fine-grained object detectors from noisy crowdsourced annotations by introducing Bayesian Detector Combination (BDC), a model-agnostic framework that jointly infers annotator reliability and aggregates bounding boxes and class labels. BDC comprises four interacting components—the Object Detector Module, Annotations-Predictions Matcher, Bounding Box Aggregator, and Class Label Aggregator—trained iteratively to converge on robust predictions and soft label distributions. The approach is validated on real crowdsourced data (VinDr-CXR and a disaster dataset) and on four large synthetic settings, consistently outperforming baselines such as MV, WBF-EARL, Crowd R-CNN, and NA in both detection accuracy and robustness to annotator variability. The results demonstrate that BDC can scale to many annotators and effectively utilize soft labels, enabling practical crowdsourced object detection without ground-truth annotations and with broad applicability across detectors. The work provides substantial empirical evidence and publicly available code/data to promote adoption in real-world crowdsourcing scenarios.

Abstract

Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions. To address these algorithmic limitations and evaluation inconsistency, we first propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations, with the unique ability of automatically inferring the annotators' label qualities. Unlike previous approaches, BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models. Due to the scarcity of real-world crowdsourced datasets, we introduce large synthetic datasets by simulating varying crowdsourcing scenarios. This allows consistent evaluation of different models at scale. Extensive experiments on both real and synthetic crowdsourced datasets show that BDC outperforms existing state-of-the-art methods, demonstrating its superiority in leveraging crowdsourced data for object detection. Our code and data are available at https://github.com/zhiqin1998/bdc.

Bayesian Detector Combination for Object Detection with Crowdsourced Annotations

TL;DR

This work tackles learning fine-grained object detectors from noisy crowdsourced annotations by introducing Bayesian Detector Combination (BDC), a model-agnostic framework that jointly infers annotator reliability and aggregates bounding boxes and class labels. BDC comprises four interacting components—the Object Detector Module, Annotations-Predictions Matcher, Bounding Box Aggregator, and Class Label Aggregator—trained iteratively to converge on robust predictions and soft label distributions. The approach is validated on real crowdsourced data (VinDr-CXR and a disaster dataset) and on four large synthetic settings, consistently outperforming baselines such as MV, WBF-EARL, Crowd R-CNN, and NA in both detection accuracy and robustness to annotator variability. The results demonstrate that BDC can scale to many annotators and effectively utilize soft labels, enabling practical crowdsourced object detection without ground-truth annotations and with broad applicability across detectors. The work provides substantial empirical evidence and publicly available code/data to promote adoption in real-world crowdsourcing scenarios.

Abstract

Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions. To address these algorithmic limitations and evaluation inconsistency, we first propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations, with the unique ability of automatically inferring the annotators' label qualities. Unlike previous approaches, BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models. Due to the scarcity of real-world crowdsourced datasets, we introduce large synthetic datasets by simulating varying crowdsourcing scenarios. This allows consistent evaluation of different models at scale. Extensive experiments on both real and synthetic crowdsourced datasets show that BDC outperforms existing state-of-the-art methods, demonstrating its superiority in leveraging crowdsourced data for object detection. Our code and data are available at https://github.com/zhiqin1998/bdc.
Paper Structure (49 sections, 39 equations, 11 figures, 7 tables, 2 algorithms)

This paper contains 49 sections, 39 equations, 11 figures, 7 tables, 2 algorithms.

Figures (11)

  • Figure 1: Examples of ambiguous cases with noisy or incorrect annotations on (a - e) MS COCO cocodataset, (f - j) VinDr-CXR vindr2022 and (k - o) disaster response dataset bccnet2018 to identify damaged and undamaged buildings with class names 'd' and 'u', respectively. (a) Mislabelling one object with two bounding boxes. (b - d) Duplicate annotations of visually similar objects (e.g., car/truck, donut/cake and fork/spoon). (d)(e) Incorrect annotation of object classes (e.g., cake labelled as a bowl, only annotating one of the cake decorations as 'person'). (f - o) Annotations from different annotators (represented by the different colours) of the VinDr-CXR and disaster response datasets show significant disagreement. (Details of the datasets are given in Section \ref{['sec:real_datasets']}.)
  • Figure 2: Overall architecture of the proposed BDC. The updating of the aggregator's parameter and the object detector's parameters is repeated iteratively until convergence. Best viewed in colour.
  • Figure 3: Comparison of aggregated labels from different methods on the VinDr-CXR vindr2022. For WBF-EARL Le2023, the number beside the class label is the annotators' level of agreement while for Crowd R-CNN crowdrcnn2020 and BDC, the number indicates the class probability. For NA, the colours represent the different annotators.
  • Figure 4: Comparison of aggregated labels from different methods on the VOC-MIX synthetic dataset. For WBF-EARL Le2023, the number beside the class label is the annotators' level of agreement while for Crowd R-CNN crowdrcnn2020 and BDC, the number indicates the class probability. For NA, the colours represent the different annotators.
  • Figure 5: (a) Effect of the number of annotators, $K$, and (b) effect of varying percentage of noisy annotators (with $K=25$) on the test AP$^{.5:.95}$ of YOLOv7. Results for the 'NA' method are unavailable when $K \ge 100$ due to GPU out-of-memory error.
  • ...and 6 more figures