Table of Contents
Fetching ...

A Unified Gradient-based Framework for Task-agnostic Continual Learning-Unlearning

Zhehao Huang, Xinwen Cheng, Jie Zhang, Jinghao Zheng, Haoran Wang, Zhengbao He, Tao Li, Xiaolin Huang

TL;DR

This work addresses the need for systems that learn continually while unlearning specific data; it proposes a unified gradient-based continual learning-unlearning framework built on $D_{KL}$ minimization that aligns learning, unlearning, and retention under a remain-preserved manifold. The approach decomposes the gradient into four components—learning, unlearning, remaining-knowledge preservation, and a weight saliency modulation—coupled with an implicit online Hessian via a fast-slow update and adaptive sample weighting to balance plasticity and stability. Key contributions include a four-term gradient decomposition, the remain-preserved Hessian constraint, an efficient Hessian-approximate update mechanism, a balanced weight saliency mask, and a task-agnostic CLU paradigm with cross-task and random-sample unlearning benchmarks. Empirically, UG-CLU coordinates incremental learning, precise unlearning, and knowledge stability across CIFAR-10 and TinyImageNet with diverse architectures, outperforming task-aware baselines and ablation studies confirming the utility of each component. The work provides a theoretical foundation and practical framework for dynamic, privacy-aware lifelong learning systems with fine-grained unlearning capabilities.

Abstract

Recent advancements in deep models have highlighted the need for intelligent systems that combine continual learning (CL) for knowledge acquisition with machine unlearning (MU) for data removal, forming the Continual Learning-Unlearning (CLU) paradigm. While existing work treats CL and MU as separate processes, we reveal their intrinsic connection through a unified optimization framework based on Kullback-Leibler divergence minimization. This framework decomposes gradient updates for approximate CLU into four components: learning new knowledge, unlearning targeted data, preserving existing knowledge, and modulation via weight saliency. A critical challenge lies in balancing knowledge update and retention during sequential learning-unlearning cycles. To resolve this stability-plasticity dilemma, we introduce a remain-preserved manifold constraint to induce a remaining Hessian compensation for CLU iterations. A fast-slow weight adaptation mechanism is designed to efficiently approximate the second-order optimization direction, combined with adaptive weighting coefficients and a balanced weight saliency mask, proposing a unified implementation framework for gradient-based CLU. Furthermore, we pioneer task-agnostic CLU scenarios that support fine-grained unlearning at the cross-task category and random sample levels beyond the traditional task-aware setups. Experiments demonstrate that the proposed UG-CLU framework effectively coordinates incremental learning, precise unlearning, and knowledge stability across multiple datasets and model architectures, providing a theoretical foundation and methodological support for dynamic, compliant intelligent systems.

A Unified Gradient-based Framework for Task-agnostic Continual Learning-Unlearning

TL;DR

This work addresses the need for systems that learn continually while unlearning specific data; it proposes a unified gradient-based continual learning-unlearning framework built on minimization that aligns learning, unlearning, and retention under a remain-preserved manifold. The approach decomposes the gradient into four components—learning, unlearning, remaining-knowledge preservation, and a weight saliency modulation—coupled with an implicit online Hessian via a fast-slow update and adaptive sample weighting to balance plasticity and stability. Key contributions include a four-term gradient decomposition, the remain-preserved Hessian constraint, an efficient Hessian-approximate update mechanism, a balanced weight saliency mask, and a task-agnostic CLU paradigm with cross-task and random-sample unlearning benchmarks. Empirically, UG-CLU coordinates incremental learning, precise unlearning, and knowledge stability across CIFAR-10 and TinyImageNet with diverse architectures, outperforming task-aware baselines and ablation studies confirming the utility of each component. The work provides a theoretical foundation and practical framework for dynamic, privacy-aware lifelong learning systems with fine-grained unlearning capabilities.

Abstract

Recent advancements in deep models have highlighted the need for intelligent systems that combine continual learning (CL) for knowledge acquisition with machine unlearning (MU) for data removal, forming the Continual Learning-Unlearning (CLU) paradigm. While existing work treats CL and MU as separate processes, we reveal their intrinsic connection through a unified optimization framework based on Kullback-Leibler divergence minimization. This framework decomposes gradient updates for approximate CLU into four components: learning new knowledge, unlearning targeted data, preserving existing knowledge, and modulation via weight saliency. A critical challenge lies in balancing knowledge update and retention during sequential learning-unlearning cycles. To resolve this stability-plasticity dilemma, we introduce a remain-preserved manifold constraint to induce a remaining Hessian compensation for CLU iterations. A fast-slow weight adaptation mechanism is designed to efficiently approximate the second-order optimization direction, combined with adaptive weighting coefficients and a balanced weight saliency mask, proposing a unified implementation framework for gradient-based CLU. Furthermore, we pioneer task-agnostic CLU scenarios that support fine-grained unlearning at the cross-task category and random sample levels beyond the traditional task-aware setups. Experiments demonstrate that the proposed UG-CLU framework effectively coordinates incremental learning, precise unlearning, and knowledge stability across multiple datasets and model architectures, providing a theoretical foundation and methodological support for dynamic, compliant intelligent systems.

Paper Structure

This paper contains 39 sections, 8 theorems, 62 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Under the Euclidean manifold metric, $\rho(\theta_k,\theta_{k+1})=\frac{1}{2}\lVert \theta_k-\theta_{k+1} \rVert^2$. Assuming that the current model satisfies eq: theta k argmin lu. Let $H_*^R=\nabla^2\mathcal{L}^R(\theta_*)$ denote the Hessian of the oracle model on the remaining set and $H_{k}^L=\

Figures (5)

  • Figure 1: A visual comparison of the paradigms for CL, MU, and CLU systems, where $\mathcal{T}^L_t$ and $\mathcal{T}^U_t$ represent the learning and unlearning tasks, respectively. ($a$) Traditional CL systems DeLange2019ACLWang2023ACSZhou2023DeepCL adopt a sequential incremental learning paradigm, gradually expanding the model's capabilities through a stream of temporal task data. ($b$) Typical MU systems Bourtoule2019MachineUshaik2023exploringxu2024machine require removing the associated influence of specific classes or data from a well-trained model on pre-trained dataset. ($c$) Existing CLU systems liu2022continualchatterjee2024unifiedframeworkcontinuallearning, while integrating CL and MU, not only need to learn from data but may also require unlearning. However, they only support task-level unlearning updates, i.e., task-aware CLU. ($d$) We further propose a task-agnostic CLU system that supports fine-grained knowledge units (classes or data samples) for targeted unlearning operations, enabling more precise control over cognitive evolution.
  • Figure 2: The visualization diagram for the optimization process of the proposed approximate CLU. We model the trajectory starting from the initial model $\theta_0$, which is well-trained on both the unlearning data $\mathcal{D}^U$ and the remaining data $\mathcal{D}^R$, guiding the model's output distribution toward the optimal model $\theta_*$ that achieves low error on both the newly learning data $\mathcal{D}^L$ and the remaining data $\mathcal{D}^R$. For the solution, we not only derive the vanilla gradient descent optimization trajectory (gray arrow trajectory in the figure, Proposition. \ref{['prop: clu vanilla']}) based on the Euclidean $\ell_2$ metric, but also further consider the optimization trajectory corrected by the remaining Hessian under the remain-preserved manifold $D^R_{\mathrm{KL}}$ metric (blue arrow trajectory in the figure, Proposition. \ref{['prop: clu remain']}). This effectively balances the efficacy of knowledge updates with the preservation of model generalization performance, significantly reducing performance degradation on the remaining data $\mathcal{D}^R$.
  • Figure 3: Three core modules of our proposed UG-CLU. The implicit online Hessian approximation derives the task optimization direction modulated by the Hessian through a fast-slow weight update mechanism, reducing computational overhead. The sample-wise adaptive coefficient leverages task properties and loss magnitude adaptation to re-weight the sample loss, effectively approximating the optimality conditions of the iterative process. The balanced weight saliency mask simultaneously considers the efficiency of task knowledge updates and the effectiveness of knowledge retention, selecting the most important parameters for updates. The combination of these three modules aligns with the results derived from our theoretical analysis, achieving approximate CLU.
  • Figure 4: We adapt the interclass confusion unlearning setup from Goel2022TowardsAE and extend it to the task-agnostic CLU system. Shapes and colors represent the actual and labeled classes, respectively. In each learning task, a portion of the data labels is shuffled and misclassified as noisy data, which needs to be addressed in the unlearning task to eliminate the harmful effects of these confusion sets on the model.
  • Figure A1: Sensitivity analysis of threshold $\lambda$ in weight saliency mask, evaluated on task-agnostic CLU setting, CIFAR-10 using ResNet-18.

Theorems & Definitions (12)

  • Proposition 1
  • Proposition 2
  • Corollary 1
  • Corollary 2
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • ...and 2 more