Table of Contents
Fetching ...

Do Neural Networks Lose Plasticity in a Gradually Changing World?

Tianhui Liu, Lili Mou

TL;DR

This work argues that loss of plasticity in continual learning largely stems from abrupt task shifts rather than intrinsic limitations of neural networks. It introduces a Gradually Changing Environment using input/output interpolation and task sampling to simulate smooth distribution shifts, backed by theoretical analysis under standard smoothness and local convexity assumptions. Empirically, the approach preserves trainability and generalization across vision benchmarks and language tasks, often matching or surpassing traditional abrupt-change mitigations. The findings offer a realistic, robust framework for real-world continual learning with reduced need for extensive hyperparameter tuning and complex regularization strategies.

Abstract

Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on contrived settings with abrupt task transitions, which often do not reflect real-world environments. In this paper, we propose to investigate a gradually changing environment, and we simulate this by input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the loss of plasticity is an artifact of abrupt tasks changes in the environment and can be largely mitigated if the world changes gradually.

Do Neural Networks Lose Plasticity in a Gradually Changing World?

TL;DR

This work argues that loss of plasticity in continual learning largely stems from abrupt task shifts rather than intrinsic limitations of neural networks. It introduces a Gradually Changing Environment using input/output interpolation and task sampling to simulate smooth distribution shifts, backed by theoretical analysis under standard smoothness and local convexity assumptions. Empirically, the approach preserves trainability and generalization across vision benchmarks and language tasks, often matching or surpassing traditional abrupt-change mitigations. The findings offer a realistic, robust framework for real-world continual learning with reduced need for extensive hyperparameter tuning and complex regularization strategies.

Abstract

Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on contrived settings with abrupt task transitions, which often do not reflect real-world environments. In this paper, we propose to investigate a gradually changing environment, and we simulate this by input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the loss of plasticity is an artifact of abrupt tasks changes in the environment and can be largely mitigated if the world changes gradually.
Paper Structure (16 sections, 6 theorems, 19 equations, 6 figures)

This paper contains 16 sections, 6 theorems, 19 equations, 6 figures.

Key Result

Lemma 4.3

Consider gradient descent (GD) starting from any point in an $(r,\mu)$-locally strongly convex domain ${\mathbb{D}}_{{\bm{x}}_f^*}$ of a $\beta$-smooth function $f$, for some $\beta\ge\mu>0$. Let $({\bm{x}}_k)_{ k=1}^N$ be a sequence generated by GD. If the step size satisfies $\eta \le \min(\frac{1

Figures (6)

  • Figure 1: Trainability for Random Image Labeling tasks on MNIST and CIFAR10 using an MLP or a Resnet-18 model. Output interpolation is more effective than other plasticity mitigation methods for these vision benchmarks.
  • Figure 2: Trainability for random Seq2Seq task on synthetic text using T5-small. Task sampling effectively mitigates loss of trainability.
  • Figure 3: Continual learning with Random Pixel Permuting tasks on EMNIST using a 4-layer MLP model. Generalizability is well preserved in a gradually changing environment.
  • Figure 4: Generalizability evaluated by test BLEU2 score on Bigram Cipher tasks on customized T5-small model. The gradually changing environment is effective in maintaining test BLEU2 score on new tasks.
  • Figure 5: The effect of granularity of the interpolation step size on plasticity preseivation for both trainability and generalizability task. A smaller step size simulates gradually changing environment better and retains more plasticity.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Definition 4.1: Smoothness
  • Definition 4.2: Locally Strongly Convex
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Lemma 4.5
  • proof
  • Lemma 4.6
  • proof
  • ...and 9 more