Teach YOLO to Remember: A Self-Distillation Approach for Continual Object Detection
Riccardo De Monte, Davide Dalle Pezze, Gian Antonio Susto
TL;DR
This work tackles catastrophic forgetting in continual object detection by adapting Learning without Forgetting (LwF) to one-stage, anchor-free detectors, specifically YOLOv8. It introduces a tailored self-distillation framework (YOLO LwF) that distills both regression and classification signals using a temperature-scaled cross-entropy for regression and a prediction-wise weighted cross-entropy guided by teacher confidence and spatial overlap, augmented with a masking strategy for overlapping objects. The method is combined with an experience replay memory balanced by OCDM, showing state-of-the-art gains on VOC (+2.1% mAP) and COCO (+2.9% mAP) benchmarks and robust stability-plasticity trade-offs across diverse CL scenarios. Practically, this yields a viable, real-time continual learning approach for anchor-free detectors, with the memory-based variant providing the strongest performance, and establishes a strong baseline for future CLOD research.
Abstract
Real-time object detectors like YOLO achieve exceptional performance when trained on large datasets for multiple epochs. However, in real-world scenarios where data arrives incrementally, neural networks suffer from catastrophic forgetting, leading to a loss of previously learned knowledge. To address this, prior research has explored strategies for Class Incremental Learning (CIL) in Continual Learning for Object Detection (CLOD), with most approaches focusing on two-stage object detectors. However, existing work suggests that Learning without Forgetting (LwF) may be ineffective for one-stage anchor-free detectors like YOLO due to noisy regression outputs, which risk transferring corrupted knowledge. In this work, we introduce YOLO LwF, a self-distillation approach tailored for YOLO-based continual object detection. We demonstrate that when coupled with a replay memory, YOLO LwF significantly mitigates forgetting. Compared to previous approaches, it achieves state-of-the-art performance, improving mAP by +2.1% and +2.9% on the VOC and COCO benchmarks, respectively.
