Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Yunqin Zhu; Jun Jin

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Yunqin Zhu, Jun Jin

TL;DR

This study evaluates four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks, and demonstrates that forgetting and collapse are strongly related.

Abstract

Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank). This indicates a link to forgetting, since the networks lose the ability to expand their feature space to learn new tasks, which forces the network to overwrite existing representations. Therefore, in this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank. To be more specific, we evaluated four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks. Those models are trained through the SGD, Learning-without-Forgetting (LwF), and Experience Replay (ER) strategies separately. The results demonstrate that forgetting and collapse are strongly related, and different continual learning strategies help models preserve both capacity and performance in different efficiency.

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

TL;DR

Abstract

Paper Structure (39 sections, 16 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 39 sections, 16 equations, 14 figures, 1 table, 1 algorithm.

Introduction
Background
Continual Learning Settings
Task-Incremental Learning (Task-IL)
Class-Incremental Learning (Class-IL)
Catastrophic Forgetting
Loss of Plasticity
Parameter and Functional Regularization
Parameter-based Regularization
Functional Regularization
Learning without Forgetting (LwF)
Experience Replay
Effective Rank (eRank)
Weight and Activation Effective Rank
Architectures
...and 24 more sections

Figures (14)

Figure 1: Three Different CL Settings setting
Figure 2: The idea of Learning without Forgetting li2017learningforgetting
Figure 3: Five sequential binary classification tasks of Split Mnist dataset
Figure 4: The update of hidden state $h_t$ is defined by Update Gate $z_t$, Reset Gate $r_t$ and Candidate State $\tilde{h}_t$ in Gated Recurrent Block of ConvGRU
Figure 5: The Architecture of ResNet-18
...and 9 more figures

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

TL;DR

Abstract

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (14)