Table of Contents
Fetching ...

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

Yiyang Lu, Yu He, Jianlong Chen, Hongyuan Zha

TL;DR

Memory-Inspired Sampler and Scheduler Replay (MSSR), an experience replay framework that estimates sample-level memory strength and schedules rehearsal at adaptive intervals to mitigate catastrophic forgetting while maintaining fast adaptation is proposed.

Abstract

Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While strong adaptability enables rapid acquisition of new knowledge, it also exposes LLMs to catastrophic forgetting, where previously learned skills degrade during sequential training. Existing replay-based strategies, such as fixed interleaved replay, accuracy-supervised, and loss-driven scheduling, remain limited: some depend on heuristic rules and provide only partial mitigation of forgetting, while others improve performance but incur substantial computational overhead. Motivated by retention dynamics under sequential fine-tuning, we propose Memory-Inspired Sampler and Scheduler Replay (MSSR), an experience replay framework that estimates sample-level memory strength and schedules rehearsal at adaptive intervals to mitigate catastrophic forgetting while maintaining fast adaptation. Extensive experiments across three backbone models and 11 sequential tasks show that MSSR consistently outperforms state-of-the-art replay baselines, with particularly strong gains on reasoning-intensive and multiple-choice benchmarks.

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

TL;DR

Memory-Inspired Sampler and Scheduler Replay (MSSR), an experience replay framework that estimates sample-level memory strength and schedules rehearsal at adaptive intervals to mitigate catastrophic forgetting while maintaining fast adaptation is proposed.

Abstract

Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While strong adaptability enables rapid acquisition of new knowledge, it also exposes LLMs to catastrophic forgetting, where previously learned skills degrade during sequential training. Existing replay-based strategies, such as fixed interleaved replay, accuracy-supervised, and loss-driven scheduling, remain limited: some depend on heuristic rules and provide only partial mitigation of forgetting, while others improve performance but incur substantial computational overhead. Motivated by retention dynamics under sequential fine-tuning, we propose Memory-Inspired Sampler and Scheduler Replay (MSSR), an experience replay framework that estimates sample-level memory strength and schedules rehearsal at adaptive intervals to mitigate catastrophic forgetting while maintaining fast adaptation. Extensive experiments across three backbone models and 11 sequential tasks show that MSSR consistently outperforms state-of-the-art replay baselines, with particularly strong gains on reasoning-intensive and multiple-choice benchmarks.
Paper Structure (87 sections, 2 theorems, 46 equations, 2 figures, 12 tables, 1 algorithm)

This paper contains 87 sections, 2 theorems, 46 equations, 2 figures, 12 tables, 1 algorithm.

Key Result

Lemma 1.1

Fix a review time $t_i^{\star}$ and assume no review occurs on $[t_i^{\star},t)$. Then

Figures (2)

  • Figure 1: Comparison of replay triggering strategies in continual fine-tuning. (a) Fixed replay performs replay at a constant interval, ignoring optimization dynamics. (b) Loss-based replay triggers replay when the loss exceeds a threshold, but noisy high-frequency fluctuations can cause frequent spurious triggers. (c) Accuracy-based replay reacts to evaluation drops, yet often suffers from lag since replay starts after accuracy has already degraded. (d) MSSR (ours) is time-aware and memory-inspired, scheduling replay based on time-dependent retention to stabilize long-term performance.
  • Figure 2: Overall architecture of the MSSR framework. The framework consists of two core components that jointly govern replay behavior. (1) a sample-level replay sampler (left), which tracks per-sample memory strength by modeling loss-driven and time-dependent decay, and converts memory states into probabilistic replay weights. (2) an adaptive replay scheduler (right), which which regulates replay timing via expanding intervals and replay volume via a time-decaying ratio. At each replay event, sampled data are merged with current-task samples and used for LoRA-based fine-tuning, forming a closed-loop, memory-aware continual learning process.

Theorems & Definitions (4)

  • Lemma 1.1: Closed form between reviews
  • proof
  • Proposition 1.2: Riemann approximation error
  • proof