Table of Contents
Fetching ...

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr

TL;DR

This work defines incremental learning more precisely by introducing Forgetting and Intransigence as core evaluation axes and proposes RWalk, a KL-divergence–based, memory-efficient learning rule that generalizes EWC++ and Path Integral. RWalk combines a KL-based regularizer, optimization-path–driven parameter importance, and replay-based sampling to balance preserving past knowledge with updating for new tasks, achieving strong accuracy while reducing forgetting and intransigence on MNIST and CIFAR-100. The paper also analyzes single-head versus multi-head settings and demonstrates that small replay buffers, alongside strategic sampling, substantially mitigate intransigence, highlighting practical avenues for scalable IL. Overall, the contributions include principled metrics, a theoretically grounded algorithm, and comprehensive empirical insights into forgetting vs. intransigence dynamics for incremental classifiers.

Abstract

Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

TL;DR

This work defines incremental learning more precisely by introducing Forgetting and Intransigence as core evaluation axes and proposes RWalk, a KL-divergence–based, memory-efficient learning rule that generalizes EWC++ and Path Integral. RWalk combines a KL-based regularizer, optimization-path–driven parameter importance, and replay-based sampling to balance preserving past knowledge with updating for new tasks, achieving strong accuracy while reducing forgetting and intransigence on MNIST and CIFAR-100. The paper also analyzes single-head versus multi-head settings and demonstrates that small replay buffers, alongside strategic sampling, substantially mitigate intransigence, highlighting practical avenues for scalable IL. Overall, the contributions include principled metrics, a theoretically grounded algorithm, and comprehensive empirical insights into forgetting vs. intransigence dynamics for incremental classifiers.

Abstract

Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.

Paper Structure

This paper contains 42 sections, 1 theorem, 16 equations, 6 figures, 4 tables.

Key Result

lemma 1

Assuming $\Delta \theta \to 0$, the second-order Taylor approximation of KL-divergence can be written Amari98NaturalGradientPascanu14NaturalGradient as: where $F_{\theta}$ is the empirical Fisher at $\theta$.

Figures (6)

  • Figure 1: Parameter importance accumulated over the optimization trajectory.
  • Figure 2: Accuracy on incremental MNIST with multi-head evaluation (top), and single-head evaluation without (middle) and with samples (bottom). First five columns show the variation in performance for different tasks, e.g., the first plot depicts the performance variation on Task 1 when trained incrementally over five tasks. The last column shows the accuracy ($A_k$, refer Sec. \ref{['sec:eval_measures']}). Mean of features (MoF) sampling is used.
  • Figure 3: Interplay between forgetting and intransigence.
  • Figure 4: Comparison by increasing the number of samples. On MNIST and CIFAR each class has around 5000 and 500 samples, respectively. With increasing number of samples, the performance of Vanilla improved, but in the range where Vanilla is poor, RWalk consistently performs the best. Uniform sampling is used.
  • Figure 5: Comparison of different sampling strategies discussed in Sec. \ref{['sec:sampling']} on MNIST (top) and CIFAR-100 (bottom). Mean of features (MoF) outperforms others.
  • ...and 1 more figures

Theorems & Definitions (2)

  • lemma 1
  • proof