Table of Contents
Fetching ...

Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning

Jiaqi Li, Yuanhao Lai, Rui Wang, Changjian Shui, Sabyasachi Sahoo, Charles X. Ling, Shichun Yang, Boyu Wang, Christian Gagné, Fan Zhou

TL;DR

HALRP tackles the continual learning challenge by introducing Hessian-Aware Low-Rank Perturbations, which inject task-adaptive, low-rank modifications into a fixed base model. By linking perturbation impact to Hessian information, the method automatically selects layer-wise ranks to balance accuracy and parameter growth, while pruning reduces unnecessary capacity. The approach is supported by a theoretical bound connecting loss perturbation to Hessian norms and singular-value spectra, and is validated across diverse benchmarks with favorable accuracy, task-order robustness, and efficiency compared to state-of-the-art baselines. HALRP demonstrates scalable continual learning with controllable memory footprint and efficient adaptation to many sequential tasks.

Abstract

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.

Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning

TL;DR

HALRP tackles the continual learning challenge by introducing Hessian-Aware Low-Rank Perturbations, which inject task-adaptive, low-rank modifications into a fixed base model. By linking perturbation impact to Hessian information, the method automatically selects layer-wise ranks to balance accuracy and parameter growth, while pruning reduces unnecessary capacity. The approach is supported by a theoretical bound connecting loss perturbation to Hessian norms and singular-value spectra, and is validated across diverse benchmarks with favorable accuracy, task-order robustness, and efficiency compared to state-of-the-art baselines. HALRP demonstrates scalable continual learning with controllable memory footprint and efficient adaptation to many sequential tasks.

Abstract

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.
Paper Structure (32 sections, 2 theorems, 27 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 2 theorems, 27 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Assume that a neural network of $L$ layers with vectorized weights $(\boldsymbol{\omega}^{\star}_1, \dots, \boldsymbol{\omega}^{\star}_L)$ that have converged to local optima, such that the first and second order optimally conditions are satisfied, i.e., the gradient is zero, and the Hessian is posi where $\mathbf{H}_1=\nabla^2\mathcal{L}(\boldsymbol{\omega}^{\star}_1)$ is the Hessian matrix at on

Figures (7)

  • Figure 1: Low rank decomposition between $\mathcal{T}_1$ and $\mathcal{T}_0$
  • Figure 2: Average Forgetting Statistics
  • Figure 3: (a) Average Capacity Increment ratio on CIFAR100-SuperClass w.r.t. the base model. (b) Average Time Complexity Ratio on PMNIST.
  • Figure 4: Effect of regularization coefficients $\lambda_0$ and $\lambda_1$.
  • Figure 5: Forgetting comparison on CIFAR100-Split with different task orders (A-E) under different amounts of training data.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • proof