Table of Contents
Fetching ...

DCA: Dividing and Conquering Amnesia in Incremental Object Detection

Aoting Zhang, Dongbao Yang, Chang Liu, Xiaopeng Hong, Miao Shang, Yu Zhou

TL;DR

This work addresses catastrophic forgetting in incremental object detection by identifying a forgetting imbalance: localization is relatively stable and class-agnostic, while recognition forgets severely as new classes are added. It introduces Divide-and-Conquer Amnesia (DCA), a localization-then-recognition framework that decouples the detector into two branches, preserving localization while guiding recognition with semantic knowledge from pre-trained language models. Key innovations include a semantic-guided recognition decoder, duplex classifier fusion, and Hybrid Knowledge Distillation to curb feature drift without storing old exemplars. Experiments on VOC and COCO show state-of-the-art performance, especially in long-term incremental scenarios, with exemplar-free overhead and strong robustness to language-model choices. Overall, DCA provides a scalable, semantics-driven path to robust continual object detection.

Abstract

Incremental object detection (IOD) aims to cultivate an object detector that can continuously localize and recognize novel classes while preserving its performance on previous classes. Existing methods achieve certain success by improving knowledge distillation and exemplar replay for transformer-based detection frameworks, but the intrinsic forgetting mechanisms remain underexplored. In this paper, we dive into the cause of forgetting and discover forgetting imbalance between localization and recognition in transformer-based IOD, which means that localization is less-forgetting and can generalize to future classes, whereas catastrophic forgetting occurs primarily on recognition. Based on these insights, we propose a Divide-and-Conquer Amnesia (DCA) strategy, which redesigns the transformer-based IOD into a localization-then-recognition process. DCA can well maintain and transfer the localization ability, leaving decoupled fragile recognition to be specially conquered. To reduce feature drift in recognition, we leverage semantic knowledge encoded in pre-trained language models to anchor class representations within a unified feature space across incremental tasks. This involves designing a duplex classifier fusion and embedding class semantic features into the recognition decoding process in the form of queries. Extensive experiments validate that our approach achieves state-of-the-art performance, especially for long-term incremental scenarios. For example, under the four-step setting on MS-COCO, our DCA strategy significantly improves the final AP by 6.9%.

DCA: Dividing and Conquering Amnesia in Incremental Object Detection

TL;DR

This work addresses catastrophic forgetting in incremental object detection by identifying a forgetting imbalance: localization is relatively stable and class-agnostic, while recognition forgets severely as new classes are added. It introduces Divide-and-Conquer Amnesia (DCA), a localization-then-recognition framework that decouples the detector into two branches, preserving localization while guiding recognition with semantic knowledge from pre-trained language models. Key innovations include a semantic-guided recognition decoder, duplex classifier fusion, and Hybrid Knowledge Distillation to curb feature drift without storing old exemplars. Experiments on VOC and COCO show state-of-the-art performance, especially in long-term incremental scenarios, with exemplar-free overhead and strong robustness to language-model choices. Overall, DCA provides a scalable, semantics-driven path to robust continual object detection.

Abstract

Incremental object detection (IOD) aims to cultivate an object detector that can continuously localize and recognize novel classes while preserving its performance on previous classes. Existing methods achieve certain success by improving knowledge distillation and exemplar replay for transformer-based detection frameworks, but the intrinsic forgetting mechanisms remain underexplored. In this paper, we dive into the cause of forgetting and discover forgetting imbalance between localization and recognition in transformer-based IOD, which means that localization is less-forgetting and can generalize to future classes, whereas catastrophic forgetting occurs primarily on recognition. Based on these insights, we propose a Divide-and-Conquer Amnesia (DCA) strategy, which redesigns the transformer-based IOD into a localization-then-recognition process. DCA can well maintain and transfer the localization ability, leaving decoupled fragile recognition to be specially conquered. To reduce feature drift in recognition, we leverage semantic knowledge encoded in pre-trained language models to anchor class representations within a unified feature space across incremental tasks. This involves designing a duplex classifier fusion and embedding class semantic features into the recognition decoding process in the form of queries. Extensive experiments validate that our approach achieves state-of-the-art performance, especially for long-term incremental scenarios. For example, under the four-step setting on MS-COCO, our DCA strategy significantly improves the final AP by 6.9%.

Paper Structure

This paper contains 14 sections, 10 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Forgetting imbalance within DETR-based IOD. (a) Localization is class-agnostic and can generalize predicted boxes on future classes with a high recall of 94%. (b) After fine-tuning on new data, localization is less-forgetting with recall of old classes slightly dropping from 94% to 87%, while average accuracy of recognition drops from 86% to 12%. (c) In original DETR, localization features are clearly influenced by recognition features and become category-specific. (d) After decoupling features, localization is obviously class-agnostic and we can focus on solving catastrophic forgetting on recognition.
  • Figure 2: (a) Feature ambiguity arises between new and old tasks due to feature drift. (b) With semantic that guides the optimization direction of each class, clear boundaries are effectively maintained.
  • Figure 3: Pipeline of DCA. The extracted feature sequences $\mathcal{V}_{e}$ are first fed to decoupled localization decoder for object location embeddings $\mathcal{E}_{local}$, which are then sent to decoupled Semantic-guided Recognition Decoder to probe features to get class embeddings $\mathcal{E}_{cls}$. To integrate inter-class relationships, we embed semantic features $\mathcal{Q}_{se}$ from PLMs into recognition decoder in the form of queries and perform self-attention (SA) with location embeddings. To promote unified optimization across tasks, Duplex Classifier Fusion adds a semantic head to calculate similarities between $\mathcal{E}_{cls}$ and $\mathcal{Q}_{se}$ which are combined with the standard linear head to generate final recognition scores.
  • Figure 4: Impact analysis about the balance weight. $\beta$ controls the importance of probabilities of the linear classifier.