Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr
TL;DR
This work defines incremental learning more precisely by introducing Forgetting and Intransigence as core evaluation axes and proposes RWalk, a KL-divergence–based, memory-efficient learning rule that generalizes EWC++ and Path Integral. RWalk combines a KL-based regularizer, optimization-path–driven parameter importance, and replay-based sampling to balance preserving past knowledge with updating for new tasks, achieving strong accuracy while reducing forgetting and intransigence on MNIST and CIFAR-100. The paper also analyzes single-head versus multi-head settings and demonstrates that small replay buffers, alongside strategic sampling, substantially mitigate intransigence, highlighting practical avenues for scalable IL. Overall, the contributions include principled metrics, a theoretically grounded algorithm, and comprehensive empirical insights into forgetting vs. intransigence dynamics for incremental classifiers.
Abstract
Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.
