Table of Contents
Fetching ...

Replay Consolidation with Label Propagation for Continual Object Detection

Riccardo De Monte, Davide Dalle Pezze, Marina Ceccon, Francesco Pasti, Francesco Paissan, Elisabetta Farella, Gian Antonio Susto, Nicola Bellotto

TL;DR

This work tackles continual learning for object detection, where missing annotations cause task interference in replay-based methods. The proposed Replay Consolidation with Label Propagation for Object Detection (RCLPOD) enriches replay memory with Label Propagation to attach pseudo-labels for old classes and to propagate knowledge into memory, while balancing memory via OCDM and reducing interference with a masking loss $L_{cls-mask}$ and stabilizing learning with $L_{feat-dist}$ across YOLOv8's backbone and neck features. The approach delivers state-of-the-art results on VOC and COCO CL benchmarks, demonstrating strong performance in long task sequences and favorable stability-plasticity trade-offs without increasing memory footprint. The solution is architecture-agnostic and suitable for real-world systems such as autonomous driving and robotics, offering a practical, memory-efficient path for continual object detection with modern detectors like YOLOv8.

Abstract

Continual Learning (CL) aims to learn new data while remembering previously acquired knowledge. In contrast to CL for image classification, CL for Object Detection faces additional challenges such as the missing annotations problem. In this scenario, images from previous tasks may contain instances of unknown classes that could reappear as labeled in future tasks, leading to task interference in replay-based approaches. Consequently, most approaches in the literature have focused on distillation-based techniques, which are effective when there is a significant class overlap between tasks. In our work, we propose an alternative to distillation-based approaches with a novel approach called Replay Consolidation with Label Propagation for Object Detection (RCLPOD). RCLPOD enhances the replay memory by improving the quality of the stored samples through a technique that promotes class balance while also improving the quality of the ground truth associated with these samples through a technique called label propagation. RCLPOD outperforms existing techniques on well-established benchmarks such as VOC and COC. Moreover, our approach is developed to work with modern architectures like YOLOv8, making it suitable for dynamic, real-world applications such as autonomous driving and robotics, where continuous learning and resource efficiency are essential.

Replay Consolidation with Label Propagation for Continual Object Detection

TL;DR

This work tackles continual learning for object detection, where missing annotations cause task interference in replay-based methods. The proposed Replay Consolidation with Label Propagation for Object Detection (RCLPOD) enriches replay memory with Label Propagation to attach pseudo-labels for old classes and to propagate knowledge into memory, while balancing memory via OCDM and reducing interference with a masking loss and stabilizing learning with across YOLOv8's backbone and neck features. The approach delivers state-of-the-art results on VOC and COCO CL benchmarks, demonstrating strong performance in long task sequences and favorable stability-plasticity trade-offs without increasing memory footprint. The solution is architecture-agnostic and suitable for real-world systems such as autonomous driving and robotics, offering a practical, memory-efficient path for continual object detection with modern detectors like YOLOv8.

Abstract

Continual Learning (CL) aims to learn new data while remembering previously acquired knowledge. In contrast to CL for image classification, CL for Object Detection faces additional challenges such as the missing annotations problem. In this scenario, images from previous tasks may contain instances of unknown classes that could reappear as labeled in future tasks, leading to task interference in replay-based approaches. Consequently, most approaches in the literature have focused on distillation-based techniques, which are effective when there is a significant class overlap between tasks. In our work, we propose an alternative to distillation-based approaches with a novel approach called Replay Consolidation with Label Propagation for Object Detection (RCLPOD). RCLPOD enhances the replay memory by improving the quality of the stored samples through a technique that promotes class balance while also improving the quality of the ground truth associated with these samples through a technique called label propagation. RCLPOD outperforms existing techniques on well-established benchmarks such as VOC and COC. Moreover, our approach is developed to work with modern architectures like YOLOv8, making it suitable for dynamic, real-world applications such as autonomous driving and robotics, where continuous learning and resource efficiency are essential.
Paper Structure (25 sections, 6 equations, 12 figures, 8 tables)

This paper contains 25 sections, 6 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Continual Learning for Object Detection pipeline. The model learns to detect new classes at each incremental training stage (task). However, the scenario is challenging due to the missing annotations problem. The example in the current task might have been learned as background in previous tasks (e.g. the class toy is shown in Task 1 but not in Task 2 or 3.).
  • Figure 2: Scheme of the RCLPOD method: (a) During the training of the new task, the new samples are first processed through the old model to generate pseudo-labels. Then they are combined with the samples from the replay memory to update the model, considering also additional losses like the Masking Loss and the one associated with feature distillation to reduce the drift of old intermediate representations. (b) Post-training procedure to update the replay memory. This procedure enhances the replay memory via label propagation. The backward step updates the old samples with new knowledge, while the forward step integrates old knowledge into new task samples. Some of the new enriched samples are stored in the memory buffer.
  • Figure 3: Comparison of the memory storage for Replay and RCLPOD. The vertical axis indicates the stored samples of each task, while the horizontal axis represents the objects associated with each class $c_i$. (a) Each task of the Replay memory has information only on the classes seen during its iteration. (b) Using the Label Propagation mechanism, the saved samples are more informative, containing knowledge of new and old classes.
  • Figure 4: An example comparing the class distributions in the memory buffer of Replay and RCLPOD. The vertical axis indicates the label frequency, while the horizontal axis represents the different labels. (a) On the left, Replay shows an unbalanced distribution. (b) At right, the distribution of RCLPOD is more balanced because of the selection mechanism.
  • Figure 5: Example of task interference for YOLO architecture due to overlapping objects. Image from replay memory: class "tennis racket" is a new class, while class "person" is an old one. Since the only ground truth available is the one for the "person" class, any model prediction for the tennis racket would be penalized in the classification loss computation.
  • ...and 7 more figures