MultIOD: Rehearsal-free Multihead Incremental Object Detector
Eden Belouadah, Arnaud Dapogny, Kevin Bailly
TL;DR
MultIOD tackles the challenge of class-incremental object detection under rehearsal-free, resource-constrained conditions by embedding a multihead feature pyramid and per-class prediction heads within CenterNet. The approach freezes previously learned components while training new class heads, employs transfer learning to maintain knowledge transfer across states, and applies class-wise NMS to reduce duplicates without requiring past data. Empirical results on Pascal VOC datasets show MultIOD outperforms distillation-based CenterNet methods while using only the current model, and it achieves favorable efficiency due to parameter reduction via the fixed representation strategy. This work provides a practical, fast, anchor-free CIOD solution with clear pathways for scalability and improved transfer in real-world streaming environments.
Abstract
Class-Incremental learning (CIL) refers to the ability of artificial agents to integrate new classes as they appear in a stream. It is particularly interesting in evolving environments where agents have limited access to memory and computational resources. The main challenge of incremental learning is catastrophic forgetting, the inability of neural networks to retain past knowledge when learning a new one. Unfortunately, most existing class-incremental methods for object detection are applied to two-stage algorithms such as Faster-RCNN, and rely on rehearsal memory to retain past knowledge. We argue that those are not suitable in resource-limited environments, and more effort should be dedicated to anchor-free and rehearsal-free object detection. In this paper, we propose MultIOD, a class-incremental object detector based on CenterNet. Our contributions are: (1) we propose a multihead feature pyramid and multihead detection architecture to efficiently separate class representations, (2) we employ transfer learning between classes learned initially and those learned incrementally to tackle catastrophic forgetting, and (3) we use a class-wise non-max-suppression as a post-processing technique to remove redundant boxes. Results show that our method outperforms state-of-the-art methods on two Pascal VOC datasets, while only saving the model in its current state, contrary to other distillation-based counterparts.
