Table of Contents
Fetching ...

Teach YOLO to Remember: A Self-Distillation Approach for Continual Object Detection

Riccardo De Monte, Davide Dalle Pezze, Gian Antonio Susto

TL;DR

This work tackles catastrophic forgetting in continual object detection by adapting Learning without Forgetting (LwF) to one-stage, anchor-free detectors, specifically YOLOv8. It introduces a tailored self-distillation framework (YOLO LwF) that distills both regression and classification signals using a temperature-scaled cross-entropy for regression and a prediction-wise weighted cross-entropy guided by teacher confidence and spatial overlap, augmented with a masking strategy for overlapping objects. The method is combined with an experience replay memory balanced by OCDM, showing state-of-the-art gains on VOC (+2.1% mAP) and COCO (+2.9% mAP) benchmarks and robust stability-plasticity trade-offs across diverse CL scenarios. Practically, this yields a viable, real-time continual learning approach for anchor-free detectors, with the memory-based variant providing the strongest performance, and establishes a strong baseline for future CLOD research.

Abstract

Real-time object detectors like YOLO achieve exceptional performance when trained on large datasets for multiple epochs. However, in real-world scenarios where data arrives incrementally, neural networks suffer from catastrophic forgetting, leading to a loss of previously learned knowledge. To address this, prior research has explored strategies for Class Incremental Learning (CIL) in Continual Learning for Object Detection (CLOD), with most approaches focusing on two-stage object detectors. However, existing work suggests that Learning without Forgetting (LwF) may be ineffective for one-stage anchor-free detectors like YOLO due to noisy regression outputs, which risk transferring corrupted knowledge. In this work, we introduce YOLO LwF, a self-distillation approach tailored for YOLO-based continual object detection. We demonstrate that when coupled with a replay memory, YOLO LwF significantly mitigates forgetting. Compared to previous approaches, it achieves state-of-the-art performance, improving mAP by +2.1% and +2.9% on the VOC and COCO benchmarks, respectively.

Teach YOLO to Remember: A Self-Distillation Approach for Continual Object Detection

TL;DR

This work tackles catastrophic forgetting in continual object detection by adapting Learning without Forgetting (LwF) to one-stage, anchor-free detectors, specifically YOLOv8. It introduces a tailored self-distillation framework (YOLO LwF) that distills both regression and classification signals using a temperature-scaled cross-entropy for regression and a prediction-wise weighted cross-entropy guided by teacher confidence and spatial overlap, augmented with a masking strategy for overlapping objects. The method is combined with an experience replay memory balanced by OCDM, showing state-of-the-art gains on VOC (+2.1% mAP) and COCO (+2.9% mAP) benchmarks and robust stability-plasticity trade-offs across diverse CL scenarios. Practically, this yields a viable, real-time continual learning approach for anchor-free detectors, with the memory-based variant providing the strongest performance, and establishes a strong baseline for future CLOD research.

Abstract

Real-time object detectors like YOLO achieve exceptional performance when trained on large datasets for multiple epochs. However, in real-world scenarios where data arrives incrementally, neural networks suffer from catastrophic forgetting, leading to a loss of previously learned knowledge. To address this, prior research has explored strategies for Class Incremental Learning (CIL) in Continual Learning for Object Detection (CLOD), with most approaches focusing on two-stage object detectors. However, existing work suggests that Learning without Forgetting (LwF) may be ineffective for one-stage anchor-free detectors like YOLO due to noisy regression outputs, which risk transferring corrupted knowledge. In this work, we introduce YOLO LwF, a self-distillation approach tailored for YOLO-based continual object detection. We demonstrate that when coupled with a replay memory, YOLO LwF significantly mitigates forgetting. Compared to previous approaches, it achieves state-of-the-art performance, improving mAP by +2.1% and +2.9% on the VOC and COCO benchmarks, respectively.

Paper Structure

This paper contains 22 sections, 4 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Artificial example for the missing annotation problem in Continual Learning for Object Detection. Objects of classes for previous tasks might be considered as background in future tasks (e.g., class "dog" in Task 2 and Task 3). In a real scenario, the three images might be different, but the same issue might arise.
  • Figure 2: Relevant YOLO features for CLOD.
  • Figure 3: For the same anchor point, in red is the predicted bounding box from the student, while in green is the teacher one. In \ref{['fig:cls-sub1']}, the student should match the teacher's output. In \ref{['fig:cls-sub2']}, the teacher might assign the label "dog", while the student should assign both "bike" and "dog" labels, with less confidence.
  • Figure 4: Results for the two long scenarios: VOC 15p1 and COCO 40p10.