Don't forget, there is more than forgetting: new metrics for Continual Learning
Natalia Díaz-Rodríguez, Vincenzo Lomonaco, David Filliat, Davide Maltoni
TL;DR
The paper tackles the evaluation gap in continual learning by introducing an implementation-independent metric framework that extends beyond forgetting. It defines seven criteria—Accuracy over time, Backward and Forward Transfer (with REM and BWT+), Model Size, Samples Storage, and Computational Efficiency—normalized via MAVT and aggregated into a flexible CL_score, plus a stability measure. An empirical study on iCIFAR-100 demonstrates how CL_score can reveal tradeoffs between accuracy, memory, and compute under different weighting schemes, using common CL baselines (Naïve, Cumulative, EWC, LWF, SI). The work advocates for comprehensive, application-aware evaluation in CL and suggests directions for refining metrics in future research.
Abstract
Continual learning consists of algorithms that learn from a stream of data/tasks continuously and adaptively thought time, enabling the incremental development of ever more complex knowledge and skills. The lack of consensus in evaluating continual learning algorithms and the almost exclusive focus on forgetting motivate us to propose a more comprehensive set of implementation independent metrics accounting for several factors we believe have practical implications worth considering in the deployment of real AI systems that learn continually: accuracy or performance over time, backward and forward knowledge transfer, memory overhead as well as computational efficiency. Drawing inspiration from the standard Multi-Attribute Value Theory (MAVT) we further propose to fuse these metrics into a single score for ranking purposes and we evaluate our proposal with five continual learning strategies on the iCIFAR-100 continual learning benchmark.
