Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Gaurav Bhatt; James Ross; Leonid Sigal

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Gaurav Bhatt, James Ross, Leonid Sigal

TL;DR

This work tackles catastrophic forgetting in continual object detection by introducing MD-DETR, a memory-augmented transformer that adapts a Deformable DETR backbone while preserving past knowledge through a dedicated memory module and a localized query mechanism. It adds continual optimization strategies, including memory chunk freezing, gradient masking, and background thresholding to counter background relegation, and employs a joint training objective $\mathcal{L} = \mathcal{L}_{detr} + \lambda_Q \mathcal{L}_Q$. Empirically, MD-DETR achieves state-of-the-art results on MS-COCO and PASCAL-VOC in a replay-free setting, with about $5-7\%$ improvements and up to $\sim10\%$ gains on challenging tasks, outperforming replay-based baselines. The work also provides detailed ablations and qualitative analyses, highlighting the effectiveness of memory-based retrieval for continual detection and outlining remaining challenges such as bounding-box deformation and confidence drift for past classes.

Abstract

Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-trained DETR-style detector to new tasks while preserving knowledge from previous tasks. We propose a novel localized query function for efficient information retrieval from memory units, aiming to minimize forgetting. Furthermore, we identify a fundamental challenge in continual detection referred to as background relegation. This arises when object categories from earlier tasks reappear in future tasks, potentially without labels, leading them to be implicitly treated as background. This is an inevitable issue in continual detection or segmentation. The introduced continual optimization technique effectively tackles this challenge. Finally, we assess the performance of our proposed system on continual detection benchmarks and demonstrate that our approach surpasses the performance of existing state-of-the-art resulting in 5-7% improvements on MS-COCO and PASCAL-VOC on the task of continual detection.

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

TL;DR

. Empirically, MD-DETR achieves state-of-the-art results on MS-COCO and PASCAL-VOC in a replay-free setting, with about

improvements and up to

gains on challenging tasks, outperforming replay-based baselines. The work also provides detailed ablations and qualitative analyses, highlighting the effectiveness of memory-based retrieval for continual detection and outlining remaining challenges such as bounding-box deformation and confidence drift for past classes.

Abstract

Paper Structure (20 sections, 9 equations, 7 figures, 7 tables)

This paper contains 20 sections, 9 equations, 7 figures, 7 tables.

Introduction
Related Work
Problem Formulation and Preliminaries
Proposed Method
Memory modules for Deformable-DETR
Query function for localized memory retrieval
Continual optimization for $\theta^{{\mathcal{T}_t}^*}$
Continual optimization for solving background relegation
Training loss
Experiments
Results
Ablations
Discussion
Conclusion
Implementation Details
...and 5 more sections

Figures (7)

Figure 1: Class-incremental continual object detection. Each task is characterized by a distinct set of classes, meaning $C^{\mathcal{T}_i} \cap C^{\mathcal{T}_j} = \emptyset$. The category person in $\mathcal{T}_1$ remains unannotated in all future tasks ($\mathcal{T}_2$ and $\mathcal{T}_3$ in the provided illustration), giving rise to the problem of background relegation for the object category person.
Figure 2: Architecture of MD-DETR at a given time-step $t$. Given an input image $x$, we use proposed query function $Q(x,\theta_\nabla,\alpha)$ to retrieve relevant memory units as a linear combination. The obtained information from the memory is utilized by the decoder across various decoding layers. The majority of the architecture remains frozen, encompassing the encoder and decoder; the trainable modules consist of memory units $\mathbf{M}$, class embedding, bounding box embedding, and ranking function $g_\psi$.
Figure 3: Studying the effect of varying the number of memory units $N_m$ and length of memory $L_m$. The ablation is conducted on MS-COCO.
Figure 4: Progression of MD-DETR performance from $\mathcal{T}_1 \rightarrow \mathcal{T}_4$ when trained in a multi-step class incremental setting on MS-COCO. To illustrate the effectiveness of MD-DETR in addressing background relegation of previously encountered classes, a comparison is presented between two architecture designs: with (MD-DETR + BT) and without (MD-DETR - BT) background thresholding. In the images shown in the first column (1$^{st}$ image for each block), the class {person} is relegated by MD-DETR - BT; similarly, for the second column, the category {car} is relegated, the third column displays relegation of classes {person, chair}, and the fourth column shows the relegation of the category {dog}. The ground truth block shows images with all annotations across all the four tasks $\{\mathcal{T}_1 \cdots \mathcal{T}_4\}$.
Figure 5: Time-space comparison b/w MD-DETR & OW-DETR.
...and 2 more figures

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

TL;DR

Abstract

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)