Anno-incomplete Multi-dataset Detection

Yiran Xu; Haoxiang Zhong; Kai Wu; Jialin Li; Yong Liu; Chengjie Wang; Shu-Tao Xia; Hongen Liao

Anno-incomplete Multi-dataset Detection

Yiran Xu, Haoxiang Zhong, Kai Wu, Jialin Li, Yong Liu, Chengjie Wang, Shu-Tao Xia, Hongen Liao

TL;DR

This work tackles the practical problem of detecting all object categories across multiple datasets with incomplete annotations and heterogeneous features. It introduces a branch-interactive detector built on FCOS, enhanced by an Attention-based Feature Interactor (AFI) and a Knowledge Amalgamation (KA) training strategy that leverages teacher models, feature alignment, distillation, and pseudo-label supervision. Empirical results on COCO and VOC show consistent improvements over strong baselines, demonstrating the method's ability to fuse information across datasets with limited cross-dataset labels and imbalanced data. The approach offers a scalable path toward unified detection across diverse data sources, with potential to extend to additional datasets and more complex multi-domain settings without requiring exhaustive re-annotation.

Abstract

Object detectors have shown outstanding performance on various public datasets. However, annotating a new dataset for a new task is usually unavoidable in real, since 1) a single existing dataset usually does not contain all object categories needed; 2) using multiple datasets usually suffers from annotation incompletion and heterogeneous features. We propose a novel problem as "Annotation-incomplete Multi-dataset Detection", and develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets. Specifically, we propose an attention feature extractor which helps to mine the relations among different datasets. Besides, a knowledge amalgamation training strategy is incorporated to accommodate heterogeneous features from different sources. Extensive experiments on different object detection datasets demonstrate the effectiveness of our methods and an improvement of 2.17%, 2.10% in mAP can be achieved on COCO and VOC respectively.

Anno-incomplete Multi-dataset Detection

TL;DR

Abstract

Paper Structure (48 sections, 4 equations, 9 figures, 11 tables)

This paper contains 48 sections, 4 equations, 9 figures, 11 tables.

Introduction
Related Works
Sparse Annnotated Object Detection
Detection with Multiple Datasets
Direct Multi-dataset Detection
Semi-Supervised Object Detection
Class Incremental Object Detection
Knowledge Distillation and Amalgamation
Proposed Methods
Problem Definition
Network Architecture
Attention-based Feature Interactor
Amalgamation Training Strategy
Feature Loss
Distillation Loss
...and 33 more sections

Figures (9)

Figure 1: The intuition of Anno-incomplete Multi-dataset Detection. The goal is to learn a model that can detect all the objects of multiple datasets with incomplete annotations and heterogenous features . Dataset A (upper-left) focuses on persons and has no annotations on dogs, in opposite to dataset B (lower-left). The target of Anno-incomplete Multi-dataset Detection (right) is to learn a model that can detect all the objects of persons and dogs from both datasets .
Figure 2: Framework of branch-interactive multi-task detector. AFI modules of each BIA head can mine the potential relations among datasets for improving unified detection performance. TeacherA and TeacherB are from amalgamation training strategy which can incorporate heterogeneous features from different sources. "Cls", "Ctr" and "Reg" refer to "classification", "centerness" and "regression" respectively.
Figure 3: Attention-based Feature Interactor Module serves as a bridge for dataset-level interaction. The module first does feature fusion of different heads (each for a dataset). Then each head selects helpful information with an attention selection mechanism.
Figure 4: Framework of Amalgamation Training Strategy. Only one teacher is shown for visual simplicity. The four losses are boxed in red and also linked to the corresponding terms in red dashed lines. "GT" means "ground-truth". The locked icon at the teacher means the network is frozen.
Figure 5: Pipeline of constructing anno-incomplete datasets from a given public dataset as described in Sec.\ref{['ssec:Dataset']}. Given the grouping of categories of the public dataset, the whole process consists of 3 steps: divide, assign and erase, and finally merge.
...and 4 more figures

Anno-incomplete Multi-dataset Detection

TL;DR

Abstract

Anno-incomplete Multi-dataset Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)