Improving Data-aware and Parameter-aware Robustness for Continual Learning

Hanxi Xiao; Fan Lyu

Improving Data-aware and Parameter-aware Robustness for Continual Learning

Hanxi Xiao, Fan Lyu

TL;DR

This paper enhances the data-aware and parameter-aware robustness of CL, proposing a Robust Continual Learning (RCL) method that effectively maintains robustness and achieves new state-of-the-art (SOTA) results.

Abstract

The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parameter-aware robustness of CL, proposing a Robust Continual Learning (RCL) method. From the data perspective, we develop a contrastive loss based on the concepts of uniformity and alignment, forming a feature distribution that is more applicable to outliers. From the parameter perspective, we present a forward strategy for worst-case perturbation and apply robust gradient projection to the parameters. The experimental results on three benchmarks show that the proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results. The code is available at: https://github.com/HanxiXiao/RCL

Improving Data-aware and Parameter-aware Robustness for Continual Learning

TL;DR

Abstract

Paper Structure (27 sections, 18 equations, 6 figures, 9 tables, 2 algorithms)

This paper contains 27 sections, 18 equations, 6 figures, 9 tables, 2 algorithms.

Introduction
Related Work
Relating Robustness of Continual Learning to Gradient
Problem Formulation and Robust Gradient
Outlier Samples Causing Abnormal Gradient
The Potential Impact of Abnormal Gradients on Flatness
Method
Robustness of Feature Distribution across Sequential Tasks
Robustness of Parameter by Worst-case Perturbation
Robust Gradient Projection
Experiments
Experimental Setup
Experimental Result
Ablation Study
Conclusion and Limitation
...and 12 more sections

Figures (6)

Figure 1: Outlier samples can generate abnormal gradients that hinder continual learning. Our method reduces the impact from both data and parameter perspectives (Left). Our method has a wider robust region, and even as the model continuously learns new tasks and the impact of abnormal gradients increases, our method can still remain undisturbed (Right).
Figure 2: (a) The impact of weight span on the loss value of previous tasks. (b) The relationship between robust loss and flatness.
Figure 3: An overview of RCL. (a) Data space. Distribute features uniform and aligned on the unit hypersphere. (b) Parameter space. Added random perturbations and worst-case perturbations to flatten the loss surface. (c) Gradient space. Using GPM to complete gradient projection.
Figure 4: New task accuracy on CIFAR-100.
Figure 5: The visualization of flatness and accuracy on CIFAR-100. (a) is the weight loss landscape of the second task after learning all ten tasks; (b) is the weight loss landscape of the fifth task after learning five tasks; (c) ACC of GPM; (d) ACC of RCL (darker is better).
...and 1 more figures

Improving Data-aware and Parameter-aware Robustness for Continual Learning

TL;DR

Abstract

Improving Data-aware and Parameter-aware Robustness for Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)