Table of Contents
Fetching ...

Utilizing dataset affinity prediction in object detection to assess training data

Stefan Becker, Jens Bayer, Ronny Hug, Wolfgang Hübner, Michael Arens

TL;DR

This work tackles principled evaluation of data pooling for object detection by introducing a dataset affinity prediction head that assigns each detection to the training datasets in a pooled pool. Implemented within YOLOv7-X, the affinity head uses multinomial logistic regression and an affinity loss to provide per-detection dataset attributions with minimal inference overhead, enabling direct feedback on training data contributions. Through experiments on MODISSA and MSOD with multiple aligned vehicle datasets, the approach demonstrates that detectors can achieve similar accuracy using a significantly sparser, affinity-informed training subset, while full pooled data yields the best performance. The dataset affinity scores offer ante-hoc explanations and a practical mechanism to identify dataset biases and optimize pooling strategies across heterogeneous, multi-sensor domains.

Abstract

Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.

Utilizing dataset affinity prediction in object detection to assess training data

TL;DR

This work tackles principled evaluation of data pooling for object detection by introducing a dataset affinity prediction head that assigns each detection to the training datasets in a pooled pool. Implemented within YOLOv7-X, the affinity head uses multinomial logistic regression and an affinity loss to provide per-detection dataset attributions with minimal inference overhead, enabling direct feedback on training data contributions. Through experiments on MODISSA and MSOD with multiple aligned vehicle datasets, the approach demonstrates that detectors can achieve similar accuracy using a significantly sparser, affinity-informed training subset, while full pooled data yields the best performance. The dataset affinity scores offer ante-hoc explanations and a practical mechanism to identify dataset biases and optimize pooling strategies across heterogeneous, multi-sensor domains.

Abstract

Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
Paper Structure (5 sections, 2 equations, 6 figures, 5 tables)

This paper contains 5 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Schematic visualization of an object detection pipeline with an additional inference head to predict the dataset affinity. The dataset affinity is inferred on object level or rather for every detected object and not on image level.
  • Figure 2: Example images from selected datasets included in the overall training set. The top images show samples from MS COCO (left), DETRAC (middle), and FLIR IR. The bottom images depict samples from VisDrone, UAVDT, and FLIR VIS. The gray areas are masked out regions that are not annotated but labeled as 'ignore regions'. From the original categories only vehicle categories are considered and mapped to the super-category 'vehicle'.
  • Figure 3: The MODISSA measurement vehicle with the used sensors for the recording of the test datasets.
  • Figure 4: Exemplary detection results of the universal 'vehicle' detector trained with the aligned dataset on the unseen MODISSA Vogelsang dataset. The color of the bounding boxes encode the assigned dataset. Detections assigned to MS COCO are highlighted in red . Assigned FLIR IR detections are shown in aqua and detections assigned to FLIR VIS are shown in lime .
  • Figure 5: Sample detection results of the universal 'vehicle' detector trained with the aligned dataset on the unseen MSOD dataset Karasawa_ACM_2017. The color of the bounding boxes encode the assigned origin dataset. Detections assigned to MS COCO are highlighted in red . Assigned FLIR IR detections are shown in aqua and detections assigned to FLIR VIS are shown in lime .
  • ...and 1 more figures