Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Weijieying Ren; Xinlong Li; Lei Wang; Tianxiang Zhao; Wei Qin

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Weijieying Ren, Xinlong Li, Lei Wang, Tianxiang Zhao, Wei Qin

TL;DR

This work tackles catastrophic forgetting in continual learning for large language models (LLMs) under parameter-efficient fine-tuning. It introduces a geometric lens, mode connectivity, and leverages it with a dual-memory interpolation-based LoRA (I-LoRA) to balance rapid adaptation and memory consolidation. Across eight domain-specific benchmarks, I-LoRA achieves up to 11% gains over state-of-the-art methods and demonstrates improved memorization, validating the practical potential of mode connectivity in PEFT for LLMs. The study provides both methodological advances and empirical evidence to guide future research on continual learning in large-scale language models.

Abstract

Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between learning plasticity and memory stability. Plenty of existing works have explored strategies like memory replay, regularization and parameter isolation, but little is known about the geometric connection of various adjacent minima in the continual LLMs fine-tuning scenarios. In this work, we investigate the geometric connections of different minima through the lens of mode connectivity, which means different minima can be connected by a low-loss valley. Through extensive experiments, we uncover the mode connectivity phenomenon in the LLMs continual learning scenario and find that it can strike a balance between plasticity and stability. Building upon these findings, we propose a simple yet effective method called Interpolation-based LoRA (I-LoRA), which constructs a dual-memory experience replay framework based on LoRA parameter interpolations. Extensive experiments and analysis on eight domain-specific CL benchmarks demonstrate that I-LoRA consistently show significant improvement over the previous state-of-the-art approaches with up to $11\%$ performance gains, providing a strong baseline and insights for future research on the large language model continual learning problem. Our code is available at \url{https://github.com/which47/LLMCL}.

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

TL;DR

Abstract

performance gains, providing a strong baseline and insights for future research on the large language model continual learning problem. Our code is available at \url{https://github.com/which47/LLMCL}.

Paper Structure (18 sections, 8 equations, 4 figures, 16 tables, 1 algorithm)

This paper contains 18 sections, 8 equations, 4 figures, 16 tables, 1 algorithm.

Introduction
Related Works
Analyzing Linear Mode Connectivity in Parameter Efficient Continual Learning for LLMs
Mode Connectivity Evaluation
Methodology
Dual Memory for Fast and Slow Learning
Continual PEFT with Dual Memory
Experiments
Experiment Setup
Dataset Description
Metric
Baselines
Implementation Details
Overall Comparison in CL
Discussion
...and 3 more sections

Figures (4)

Figure 1: Inference accuracy curves along the linear connection between two adjacent continual minima of five representation continual learning baselines on seven domain-specific benchmarks. The y-axis, named Ap(upper row), An(mid row), and Aall(bottom row) denotes accuracy on previous tasks $1:t$, on the current task $t+1$, and on all learned tasks $1:(t+1)$ respectively. X-axis, $\lambda$ indicates the interpolation factor. Taking the testing accuracy as the measure of connectivity is because it is more sensitive to moving along the path than training loss.
Figure 2: The framework of I-LoRA for Large Language Model Continual Learning. I-LoRA consists of a slow learner (depicted in blue) that learn long-term knowledge through exponential moving average of the fast learner weights; and (ii) a fast learner (depicted in yellow) retrieves historical knowledge while simultaneously adapting to current data. Both learners can be trained synchronously.
Figure :
Figure :

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

TL;DR

Abstract

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)