Table of Contents
Fetching ...

RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object Detection

Matthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel, Luc Van Gool

TL;DR

RICO introduces two realistic incremental learning benchmarks for object detection—D-RICO with domain shifts and a fixed class set, and EC-RICO with expanding classes and domains—constructed from 14 diverse datasets to reflect practical variations in sensors, conditions, and labeling policies. Using a ViTDet-based detector and multiple baselines, the study shows that existing IL methods struggle to balance stability and plasticity across long task sequences, with simple replay-based approaches providing strong forgetting mitigation but still not reaching individual-task training performance. Distillation-based methods perform poorly due to weak teachers across heterogeneous tasks, and a single-model IL strategy like LDB can achieve stability at the cost of plasticity, underscoring the need for more expressive, task-aware architectures. The results highlight the critical role of plasticity in real-world IL for object detection and establish D-RICO and EC-RICO as challenging benchmarks to guide future research toward more robust, scalable continual perception in varied real-world environments.

Abstract

Incremental Learning (IL) trains models sequentially on new data without full retraining, offering privacy, efficiency, and scalability. IL must balance adaptability to new data with retention of old knowledge. However, evaluations often rely on synthetic, simplified benchmarks, obscuring real-world IL performance. To address this, we introduce two Realistic Incremental Object Detection Benchmarks (RICO): Domain RICO (D-RICO) features domain shifts with a fixed class set, and Expanding-Classes RICO (EC-RICO) integrates new domains and classes per IL step. Built from 14 diverse datasets covering real and synthetic domains, varying conditions (e.g., weather, time of day), camera sensors, perspectives, and labeling policies, both benchmarks capture challenges absent in existing evaluations. Our experiments show that all IL methods underperform in adaptability and retention, while replaying a small amount of previous data already outperforms all methods. However, individual training on the data remains superior. We heuristically attribute this gap to weak teachers in distillation, single models' inability to manage diverse tasks, and insufficient plasticity. Our code will be made publicly available.

RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object Detection

TL;DR

RICO introduces two realistic incremental learning benchmarks for object detection—D-RICO with domain shifts and a fixed class set, and EC-RICO with expanding classes and domains—constructed from 14 diverse datasets to reflect practical variations in sensors, conditions, and labeling policies. Using a ViTDet-based detector and multiple baselines, the study shows that existing IL methods struggle to balance stability and plasticity across long task sequences, with simple replay-based approaches providing strong forgetting mitigation but still not reaching individual-task training performance. Distillation-based methods perform poorly due to weak teachers across heterogeneous tasks, and a single-model IL strategy like LDB can achieve stability at the cost of plasticity, underscoring the need for more expressive, task-aware architectures. The results highlight the critical role of plasticity in real-world IL for object detection and establish D-RICO and EC-RICO as challenging benchmarks to guide future research toward more robust, scalable continual perception in varied real-world environments.

Abstract

Incremental Learning (IL) trains models sequentially on new data without full retraining, offering privacy, efficiency, and scalability. IL must balance adaptability to new data with retention of old knowledge. However, evaluations often rely on synthetic, simplified benchmarks, obscuring real-world IL performance. To address this, we introduce two Realistic Incremental Object Detection Benchmarks (RICO): Domain RICO (D-RICO) features domain shifts with a fixed class set, and Expanding-Classes RICO (EC-RICO) integrates new domains and classes per IL step. Built from 14 diverse datasets covering real and synthetic domains, varying conditions (e.g., weather, time of day), camera sensors, perspectives, and labeling policies, both benchmarks capture challenges absent in existing evaluations. Our experiments show that all IL methods underperform in adaptability and retention, while replaying a small amount of previous data already outperforms all methods. However, individual training on the data remains superior. We heuristically attribute this gap to weak teachers in distillation, single models' inability to manage diverse tasks, and insufficient plasticity. Our code will be made publicly available.

Paper Structure

This paper contains 61 sections, 24 figures, 23 tables.

Figures (24)

  • Figure 1: We introduce two novel benchmarks built from 14 datasets: standard and expanding-classes domain incremental learning (DIL). The DIL benchmark has 15 tasks with diverse camera types, synthetic/real data, varying daytime, weather, and labeling policies. The expanding-classes DIL benchmark has 8 tasks, each adding a new class & domain while maintaining task diversity.
  • Figure 2: Overview of the domain-RICO and expanding-classes RICO benchmark tasks. See Sections \ref{['sec:benchmark:d-rico']} and \ref{['sec:benchmark:ec-rico']} for details.
  • Figure 3: Correlations between Forward Transfer, Forgetting, and Performance across experiments, with trend lines and 95% confidence intervals. Current methods remain far from the ideal of high performance & plasticity with low forgetting. All units are $\mathrm{mAP}$.
  • Figure S.1: t-SNE features
  • Figure S.2: Confusion Matrix of the Nearest Mean Classifier based on image features.
  • ...and 19 more figures