OpenCIL: Benchmarking Out-of-Distribution Detection in Class-Incremental Learning
Wenjun Miao, Guansong Pang, Trong-Tung Nguyen, Ruohang Fang, Jin Zheng, Xiao Bai
TL;DR
OpenCIL introduces the first large-scale benchmark for evaluating OOD detection within class-incremental learning (CIL), combining four CIL models with fifteen OOD detectors to form 60 baselines across CIFAR100 and ImageNet1K with six OOD datasets. It presents two frameworks for integrating OOD detectors into CIL and introduces BER, a baseline that uses New Task Energy Regularization and Old Task Energy Regularization to reduce biases toward OOD samples and newly added classes, formalized via energy-based objectives. The results reveal that higher CIL accuracy does not guarantee better OOD detection, fine-tuning-based detectors generally outperform post-hoc methods, and catastrophic forgetting affects OOD detection; BER consistently improves OOD metrics across datasets and incremental steps. The work provides practical insights for safe open-world deployment of CIL models and offers an extensible, open-source evaluation platform for continued benchmarking in this space.
Abstract
Class incremental learning (CIL) aims to learn a model that can not only incrementally accommodate new classes, but also maintain the learned knowledge of old classes. Out-of-distribution (OOD) detection in CIL is to retain this incremental learning ability, while being able to reject unknown samples that are drawn from different distributions of the learned classes. This capability is crucial to the safety of deploying CIL models in open worlds. However, despite remarkable advancements in the respective CIL and OOD detection, there lacks a systematic and large-scale benchmark to assess the capability of advanced CIL models in detecting OOD samples. To fill this gap, in this study we design a comprehensive empirical study to establish such a benchmark, named $\textbf{OpenCIL}$. To this end, we propose two principled frameworks for enabling four representative CIL models with 15 diverse OOD detection methods, resulting in 60 baseline models for OOD detection in CIL. The empirical evaluation is performed on two popular CIL datasets with six commonly-used OOD datasets. One key observation we find through our comprehensive evaluation is that the CIL models can be severely biased towards the OOD samples and newly added classes when they are exposed to open environments. Motivated by this, we further propose a new baseline for OOD detection in CIL, namely Bi-directional Energy Regularization ($\textbf{BER}$), which is specially designed to mitigate these two biases in different CIL models by having energy regularization on both old and new classes. Its superior performance is justified in our experiments. All codes and datasets are open-source at https://github.com/mala-lab/OpenCIL.
