Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
Zhi Qin Tan, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, Yunpeng Li
TL;DR
This work tackles learning fine-grained object detectors from noisy crowdsourced annotations by introducing Bayesian Detector Combination (BDC), a model-agnostic framework that jointly infers annotator reliability and aggregates bounding boxes and class labels. BDC comprises four interacting components—the Object Detector Module, Annotations-Predictions Matcher, Bounding Box Aggregator, and Class Label Aggregator—trained iteratively to converge on robust predictions and soft label distributions. The approach is validated on real crowdsourced data (VinDr-CXR and a disaster dataset) and on four large synthetic settings, consistently outperforming baselines such as MV, WBF-EARL, Crowd R-CNN, and NA in both detection accuracy and robustness to annotator variability. The results demonstrate that BDC can scale to many annotators and effectively utilize soft labels, enabling practical crowdsourced object detection without ground-truth annotations and with broad applicability across detectors. The work provides substantial empirical evidence and publicly available code/data to promote adoption in real-world crowdsourcing scenarios.
Abstract
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions. To address these algorithmic limitations and evaluation inconsistency, we first propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations, with the unique ability of automatically inferring the annotators' label qualities. Unlike previous approaches, BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models. Due to the scarcity of real-world crowdsourced datasets, we introduce large synthetic datasets by simulating varying crowdsourcing scenarios. This allows consistent evaluation of different models at scale. Extensive experiments on both real and synthetic crowdsourced datasets show that BDC outperforms existing state-of-the-art methods, demonstrating its superiority in leveraging crowdsourced data for object detection. Our code and data are available at https://github.com/zhiqin1998/bdc.
