Make Continual Learning Stronger via C-Flat

Ang Bian; Wei Li; Hangjie Yuan; Chengrong Yu; Mang Wang; Zixiang Zhao; Aojun Lu; Pengliang Ji; Tao Feng

Make Continual Learning Stronger via C-Flat

Ang Bian, Wei Li, Hangjie Yuan, Chengrong Yu, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng

TL;DR

A general framework of C-Flat is applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented, showing that the method can boost CL performance in almost all cases.

Abstract

Model generalization ability upon incrementally acquiring dynamically updating knowledge from sequentially arriving tasks is crucial to tackle the sensitivity-stability dilemma in Continual Learning (CL). Weight loss landscape sharpness minimization seeking for flat minima lying in neighborhoods with uniform low loss or smooth gradient is proven to be a strong training regime improving model generalization compared with loss minimization based optimizer like SGD. Yet only a few works have discussed this training regime for CL, proving that dedicated designed zeroth-order sharpness optimizer can improve CL performance. In this work, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code is available at https://github.com/WanNaa/C-Flat.

Make Continual Learning Stronger via C-Flat

TL;DR

Abstract

Paper Structure (24 sections, 16 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 16 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Related work
Method
A Unified CL Framework Using C-Flat
Analysis
Experimental Setup
Make Continual Learning Stronger
Hessian Eigenvalues and Hessian Traces
Visualization of Landscapes
Revisiting Zeroth-order Flatness
Computation Overhead
Ablation Study
Beyond Not-forgetting
Conclusion
Acknowledgments
...and 9 more sections

Figures (10)

Figure 1: Illustration of C-Flat overcoming catastrophe forgetting by fine-tuning the old model parameter to flat minima of new task. a) loss minima for current task only can cause catastrophe forgetting on previous ones. b) balanced optima aligned by regularization leads to unsatisfying results for both old and new tasks. c) C-Flat seeks global optima for all tasks with flattened loss landscape.
Figure 2: The Hessian eigenvalues and the traces at epochs 50, and 150 on B0_Inc10 setting (MEMO, CIFAR-100) w/ and w/o C-Flat plugged in.
Figure 3: The parametric loss landscapes of Replay (Mem.), WA (Reg.) and MEMO (Exp.) are plotted by perturbing the model parameters at the end of training (CIFAR-100, B0_Inc10) across the first two Hessian eigenvectors.
Figure 4: C-Flat vs. Zero-order flatness
Figure 5: Analysis of computation overhead
...and 5 more figures

Make Continual Learning Stronger via C-Flat

TL;DR

Abstract

Make Continual Learning Stronger via C-Flat

Authors

TL;DR

Abstract

Table of Contents

Figures (10)