Table of Contents
Fetching ...

Simple multi-dataset detection

Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

TL;DR

The paper tackles fragmentation in object detection across large, diverse datasets by training a partitioned detector with dataset-specific outputs and losses while using a fully automatic ILP-based method to unify label spaces into a single taxonomy. This unified detector, evaluated on COCO, Objects365, and OpenImages, matches dataset-specific models on training domains and generalizes to unseen domains without fine-tuning, outperforming expert-designed taxonomies in many cases. Key contributions include the partitioned training framework, automatic taxonomy learning, and extensive cross-dataset and scale-up experiments demonstrating strong generalization and practicality. The approach enables a single, deployable detector across multiple domains, with potential for easy expansion to new datasets and tighter integration with language-aware cues in future work.

Abstract

How do we build a general and broad object detection system? We use all labels of all concepts ever annotated. These labels span diverse datasets with potentially inconsistent taxonomies. In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. We use dataset-specific training protocols and losses, but share a common detection architecture with dataset-specific outputs. We show how to automatically integrate these dataset-specific outputs into a common semantic taxonomy. In contrast to prior work, our approach does not require manual taxonomy reconciliation. Experiments show our learned taxonomy outperforms a expert-designed taxonomy in all datasets. Our multi-dataset detector performs as well as dataset-specific models on each training domain, and can generalize to new unseen dataset without fine-tuning on them. Code is available at https://github.com/xingyizhou/UniDet.

Simple multi-dataset detection

TL;DR

The paper tackles fragmentation in object detection across large, diverse datasets by training a partitioned detector with dataset-specific outputs and losses while using a fully automatic ILP-based method to unify label spaces into a single taxonomy. This unified detector, evaluated on COCO, Objects365, and OpenImages, matches dataset-specific models on training domains and generalizes to unseen domains without fine-tuning, outperforming expert-designed taxonomies in many cases. Key contributions include the partitioned training framework, automatic taxonomy learning, and extensive cross-dataset and scale-up experiments demonstrating strong generalization and practicality. The approach enables a single, deployable detector across multiple domains, with potential for easy expansion to new datasets and tighter integration with language-aware cues in future work.

Abstract

How do we build a general and broad object detection system? We use all labels of all concepts ever annotated. These labels span diverse datasets with potentially inconsistent taxonomies. In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. We use dataset-specific training protocols and losses, but share a common detection architecture with dataset-specific outputs. We show how to automatically integrate these dataset-specific outputs into a common semantic taxonomy. In contrast to prior work, our approach does not require manual taxonomy reconciliation. Experiments show our learned taxonomy outperforms a expert-designed taxonomy in all datasets. Our multi-dataset detector performs as well as dataset-specific models on each training domain, and can generalize to new unseen dataset without fine-tuning on them. Code is available at https://github.com/xingyizhou/UniDet.

Paper Structure

This paper contains 19 sections, 11 equations, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: Different datasets span diverse semantic and visual domains. We learn to unify the label spaces of multiple datasets and train a single object detector that generalizes across datasets.
  • Figure 2: Standard detectors (a) are trained on one dataset with a dataset-specific loss. We train a single partitioned detector (b) on multiple datasets with shared backbone, dataset-specific outputs and loss. Finally, we unify the outputs of the partitioned detector in a common taxonomy completely automatically (c).
  • Figure 3: Sampled results of the learned unified label space. We show example differences between an expert-designed label space provided as part of the Robust Vision Challenge (top of each row, blue) and our learned label space (bottom of each row, pink). Our learned label space captures detailed visual differences. Zoom in for details.