Table of Contents
Fetching ...

Replay to Remember: Retaining Domain Knowledge in Streaming Language Models

Sneh Pillai

TL;DR

This work tackles catastrophic forgetting in streaming language models by integrating Low-Rank Adaptation (LoRA) with a lightweight replay buffer under strict computational constraints, and evaluating performance across medical, genetic, and legal domains. Using perplexity, semantic similarity, and GPT-based human-like ratings, the study demonstrates that minimal replay stabilizes domain knowledge and facilitates partial recovery, with domain-specific variation in forgetting and relearning. The findings support practical deployment of real-time, edge-friendly adaptive LLMs and highlight the value of multi-metric evaluation for capturing qualitative and quantitative shifts during continual learning. The results motivate future work on dynamic replay prioritization and modular adapters to further enhance domain-specific retention without extensive retraining.

Abstract

Continual learning in large language models (LLMs) typically encounters the critical challenge of catastrophic forgetting, where previously acquired knowledge deteriorates upon exposure to new data. While techniques like replay buffers and parameter-efficient tuning (e.g., Low-Rank Adaptation or LoRA) have been proposed, few studies investigate real-time domain adaptation under strict computational and data-stream constraints. In this paper, we demonstrate a lightweight method combining LoRA and a minimal replay mechanism in a realistic streaming setting across three diverse knowledge domains: medical question answering, genetics, and law. Using perplexity, semantic similarity, and GPT-based human-like evaluation metrics, we quantify the model's adaptation, forgetting, and recovery over time. Our experiments reveal that while catastrophic forgetting naturally occurs, even minimal replay significantly stabilizes and partially restores domain-specific knowledge. This study contributes practical insights for deploying adaptable LLMs in resource-constrained, real-world scenarios.

Replay to Remember: Retaining Domain Knowledge in Streaming Language Models

TL;DR

This work tackles catastrophic forgetting in streaming language models by integrating Low-Rank Adaptation (LoRA) with a lightweight replay buffer under strict computational constraints, and evaluating performance across medical, genetic, and legal domains. Using perplexity, semantic similarity, and GPT-based human-like ratings, the study demonstrates that minimal replay stabilizes domain knowledge and facilitates partial recovery, with domain-specific variation in forgetting and relearning. The findings support practical deployment of real-time, edge-friendly adaptive LLMs and highlight the value of multi-metric evaluation for capturing qualitative and quantitative shifts during continual learning. The results motivate future work on dynamic replay prioritization and modular adapters to further enhance domain-specific retention without extensive retraining.

Abstract

Continual learning in large language models (LLMs) typically encounters the critical challenge of catastrophic forgetting, where previously acquired knowledge deteriorates upon exposure to new data. While techniques like replay buffers and parameter-efficient tuning (e.g., Low-Rank Adaptation or LoRA) have been proposed, few studies investigate real-time domain adaptation under strict computational and data-stream constraints. In this paper, we demonstrate a lightweight method combining LoRA and a minimal replay mechanism in a realistic streaming setting across three diverse knowledge domains: medical question answering, genetics, and law. Using perplexity, semantic similarity, and GPT-based human-like evaluation metrics, we quantify the model's adaptation, forgetting, and recovery over time. Our experiments reveal that while catastrophic forgetting naturally occurs, even minimal replay significantly stabilizes and partially restores domain-specific knowledge. This study contributes practical insights for deploying adaptable LLMs in resource-constrained, real-world scenarios.

Paper Structure

This paper contains 15 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Perplexity trends over streaming chunks
  • Figure 2: Cosine similarity to baseline answers over time
  • Figure 3: GPT-4 answer ratings across streaming chunks