Table of Contents
Fetching ...

Resolving Conflicts in Lifelong Learning via Aligning Updates in Subspaces

Yueer Zhou, Yichen Wu, Ying Wei

TL;DR

This paper tackles catastrophic forgetting in continual learning for large language and vision models by analyzing parameter shifts within LoRA subspaces. It introduces PS-LoRA, which couples a Parameter Stability Loss that constrains update magnitudes and aligns update directions with historical adapters, with a magnitude-based post-training merging strategy to consolidate tasks efficiently. The approach yields improved stability and accuracy across NLP and CV benchmarks, outperforming baselines and offering memory-efficient, plug-in compatibility with orthogonality-based methods. Overall, PS-LoRA provides a principled, scalable method to balance adaptation and retention in continual learning scenarios.

Abstract

Low-Rank Adaptation (LoRA) enables efficient Continual Learning but often suffers from catastrophic forgetting due to destructive interference between tasks. Our analysis reveals that this degradation is primarily driven by antagonistic directional updates where new task gradients directly oppose the historical weight trajectory. To address this, we propose PS-LoRA (Parameter Stability LoRA), a framework designed to resolve conflicts by aligning updates within the optimization subspace. Our approach employs a dual-regularization objective that penalizes conflicting directions and constrains magnitude deviations to ensure consistency with prior knowledge. Additionally, we implement a magnitude-based merging strategy to consolidate sequential adapters into a robust representation without retraining. Experiments on NLP and Vision benchmarks show that PS-LoRA outperforms state-of-the-art methods by preserving the stability of learned representations while efficiently adapting to new domains.

Resolving Conflicts in Lifelong Learning via Aligning Updates in Subspaces

TL;DR

This paper tackles catastrophic forgetting in continual learning for large language and vision models by analyzing parameter shifts within LoRA subspaces. It introduces PS-LoRA, which couples a Parameter Stability Loss that constrains update magnitudes and aligns update directions with historical adapters, with a magnitude-based post-training merging strategy to consolidate tasks efficiently. The approach yields improved stability and accuracy across NLP and CV benchmarks, outperforming baselines and offering memory-efficient, plug-in compatibility with orthogonality-based methods. Overall, PS-LoRA provides a principled, scalable method to balance adaptation and retention in continual learning scenarios.

Abstract

Low-Rank Adaptation (LoRA) enables efficient Continual Learning but often suffers from catastrophic forgetting due to destructive interference between tasks. Our analysis reveals that this degradation is primarily driven by antagonistic directional updates where new task gradients directly oppose the historical weight trajectory. To address this, we propose PS-LoRA (Parameter Stability LoRA), a framework designed to resolve conflicts by aligning updates within the optimization subspace. Our approach employs a dual-regularization objective that penalizes conflicting directions and constrains magnitude deviations to ensure consistency with prior knowledge. Additionally, we implement a magnitude-based merging strategy to consolidate sequential adapters into a robust representation without retraining. Experiments on NLP and Vision benchmarks show that PS-LoRA outperforms state-of-the-art methods by preserving the stability of learned representations while efficiently adapting to new domains.

Paper Structure

This paper contains 43 sections, 12 equations, 13 figures, 20 tables, 1 algorithm.

Figures (13)

  • Figure 1: Comparison between incremental LoRA training and our method. (a) shows the average accuracy on all seen tasks after training on the $i$-th task $\mathcal{T}_i$. (b) visualizes the parameter shift distributions at each training stage for a randomly selected representative layer of the pre-trained model. More detailed results about different task orders and parameter shifts please see Appendix \ref{['appendix:pattern']}.
  • Figure 2: Evaluation results of different update subsets selected from the bottom-$k\%$ parameters of $\Delta \mathbf{W}_t$, analyzing the effects of sign consistency (same vs. opposite) and update magnitude on performance.
  • Figure 3: Distributions of different LoRAs. Vectors represent the LoRA directions; the angle between each vector and the axis indicates its deviation from the earliest task. Vectors in dotted lines denote merged LoRAs.
  • Figure 4: Overview of the proposed PS-LoRA. During training, Parameter Stability Loss is applied to the new LoRA to prevent sign-flip updates. After training, all LoRAs are merged by selecting the weights with the largest absolute magnitude and then added to the pre-trained model for inference.
  • Figure 5: Feature visualization across tasks.
  • ...and 8 more figures