Dynamic Object Queries for Transformer-based Incremental Object Detection
Jichuan Zhang, Wei Li, Shuang Cheng, Ya-Li Li, Shengjin Wang
TL;DR
This work addresses catastrophic forgetting in incremental object detection by introducing Dynamic object Query Assembly based DETR (DyQ-DETR). The method incrementally expands class-specific queries, uses isolated bipartite matching and disentangled self-attention to decouple old and new knowledge, and employs risk-balanced partial calibration for exemplar replay to handle incomplete labels. Key contributions include dynamic query integration with phase-wise losses, an efficient decoder design, and a risk-aware exemplar strategy that yields strong improvements on COCO 2017 against state-of-the-art IOD methods, with limited parameter overhead. The approach offers a scalable path to stability-plasticity in continual visual learning, with practical implications for robotics, autonomous systems, and open-world detection tasks.
Abstract
Incremental object detection (IOD) aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones. As the training data arrives with annotations only with new classes, IOD suffers from catastrophic forgetting. Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay, ignoring the conflict between limited model capacity and increasing knowledge. In this paper, we explore \textit{dynamic object queries} for incremental object detection built on Transformer architecture. We propose the \textbf{Dy}namic object \textbf{Q}uery-based \textbf{DE}tection \textbf{TR}ansformer (DyQ-DETR), which incrementally expands the model representation ability to achieve stability-plasticity tradeoff. First, a new set of learnable object queries are fed into the decoder to represent new classes. These new object queries are aggregated with those from previous phases to adapt both old and new knowledge well. Second, we propose the isolated bipartite matching for object queries in different phases, based on disentangled self-attention. The interaction among the object queries at different phases is eliminated to reduce inter-class confusion. Thanks to the separate supervision and computation over object queries, we further present the risk-balanced partial calibration for effective exemplar replay. Extensive experiments demonstrate that DyQ-DETR significantly surpasses the state-of-the-art methods, with limited parameter overhead. Code will be made publicly available.
