Table of Contents
Fetching ...

Toward Understanding Catastrophic Forgetting in Continual Learning

Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto

TL;DR

The paper investigates why continual learning models forget previously learned tasks by examining properties of task sequences. It introduces a general procedure that uses Task2Vec task-space embeddings to define two sequence properties—total complexity and sequential heterogeneity—and then correlates these with actual sequence hardness measured as final error. Empirically, total complexity shows a strong positive correlation with forgetting on some benchmarks (notably CIFAR-10), while sequential heterogeneity is weak or even negatively correlated in several settings, suggesting that task dissimilarity can sometimes aid continual learning. The results highlight the need to consider task complexity when designing benchmarks and algorithms and motivate customizing transfer between specific task pairs. This methodology provides a framework to study how task structure affects forgetting and could guide future improvements in benchmarks and continual learning methods.

Abstract

We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlation analysis to specify and analyze the properties we are interested in. As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity. We show that error rates are strongly and positively correlated to a task sequence's total complexity for some state-of-the-art algorithms. We also show that, surprisingly, the error rates have no or even negative correlations in some cases to sequential heterogeneity. Our findings suggest directions for improving continual learning benchmarks and methods.

Toward Understanding Catastrophic Forgetting in Continual Learning

TL;DR

The paper investigates why continual learning models forget previously learned tasks by examining properties of task sequences. It introduces a general procedure that uses Task2Vec task-space embeddings to define two sequence properties—total complexity and sequential heterogeneity—and then correlates these with actual sequence hardness measured as final error. Empirically, total complexity shows a strong positive correlation with forgetting on some benchmarks (notably CIFAR-10), while sequential heterogeneity is weak or even negatively correlated in several settings, suggesting that task dissimilarity can sometimes aid continual learning. The results highlight the need to consider task complexity when designing benchmarks and algorithms and motivate customizing transfer between specific task pairs. This methodology provides a framework to study how task structure affects forgetting and could guide future improvements in benchmarks and continual learning methods.

Abstract

We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlation analysis to specify and analyze the properties we are interested in. As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity. We show that error rates are strongly and positively correlated to a task sequence's total complexity for some state-of-the-art algorithms. We also show that, surprisingly, the error rates have no or even negative correlations in some cases to sequential heterogeneity. Our findings suggest directions for improving continual learning benchmarks and methods.

Paper Structure

This paper contains 15 sections, 7 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Error vs. (a) total complexity, (b) sequential heterogeneity and (c) normalized sequential heterogeneity on CIFAR-10, together with the linear regression fits and 95% confidence intervals. Green (red) color indicates statistically significant positive (negative) correlations. Black color indicates negligible correlations.
  • Figure 2: Details of the error rates of VCL and SI on two typical task sequences from CIFAR-10. Each column shows the errors on a particular task when subsequent tasks are continuously observed. Sequence 1 contains the binary tasks 2/9, 0/4, 3/9, 4/8, 1/2 with sequential heterogeneity 0.091, while sequence 2 contains the tasks 1/2, 2/9, 3/9, 0/4, 4/8 with sequential heterogeneity 0.068 (the labels are encoded to 0, 1, …, 9 as usually done for this dataset). For both algorithms, the final average errors (the last points in the right-most plots) on sequence 2 are higher than those on sequence 1, despite sequence 1's higher sequential heterogeneity.
  • Figure 3: Average error rates of VCL, coreset VCL and SI on 3 task sequences from MNIST with different complexity levels. The high complexity sequence contains the binary tasks 0/1, 2/5, 3/5, 2/3, 2/6 with total complexity 0.48, while the low complexity sequence contains the tasks 0/1, 1/8, 1/3, 1/5, 7/8 with total complexity 0.35. The standard sequence contains the common split 0/1, 2/3, 4/5, 6/7, 8/9 with total complexity 0.41.
  • Figure 4: Total complexity vs. average error, together with the linear regression fit and 95% confidence interval, for each algorithm and test in Table 1(a). Green color indicates statistically significant positive correlations. Black color indicates negligible correlations.
  • Figure 5: Sequential heterogeneity vs. average error, together with the linear regression fit and 95% confidence interval, for each algorithm and test in Table 1(b). Green color indicates statistically significant positive correlations. Black color indicates negligible correlations.
  • ...and 1 more figures