Table of Contents
Fetching ...

Interweaving Memories of a Siamese Large Language Model

Xin Song, Zhikai Xue, Guoxiu He, Jiawei Liu, Wei Lu

TL;DR

This work addresses catastrophic forgetting during parameter-efficient fine-tuning (PEFT) of large language models by introducing IMSM, a model-agnostic framework that uses a siamese LLM to retain original world knowledge while incorporating task-specific updates. IMSM interweaves memories from the frozen pre-trained parameters and the PEFT-tuned parameters via a query-aware gate, enabling dynamic, token-level fusion during generation. Through extensive experiments on multiple open-source LLM backbones and benchmark datasets, IMSM achieves superior alignment performance while maintaining comparable efficiency to standard PEFT methods and effectively mitigating forgetting on non-target tasks. The approach demonstrates a practical path to balance plasticity and stability in LLM fine-tuning, with potential extensions to deeper memory fusion across layers.

Abstract

Parameter-efficient fine-tuning (PEFT) methods optimize large language models (LLMs) by modifying or introducing a small number of parameters to enhance alignment with downstream tasks. However, they can result in catastrophic forgetting, where LLMs prioritize new knowledge at the expense of comprehensive world knowledge. A promising approach to mitigate this issue is to recall prior memories based on the original knowledge. To this end, we propose a model-agnostic PEFT framework, IMSM, which Interweaves Memories of a Siamese Large Language Model. Specifically, our siamese LLM is equipped with an existing PEFT method. Given an incoming query, it generates two distinct memories based on the pre-trained and fine-tuned parameters. IMSM then incorporates an interweaving mechanism that regulates the contributions of both original and enhanced memories when generating the next token. This framework is theoretically applicable to all open-source LLMs and existing PEFT methods. We conduct extensive experiments across various benchmark datasets, evaluating the performance of popular open-source LLMs using the proposed IMSM, in comparison to both classical and leading PEFT methods. Our findings indicate that IMSM maintains comparable time and space efficiency to backbone PEFT methods while significantly improving performance and effectively mitigating catastrophic forgetting.

Interweaving Memories of a Siamese Large Language Model

TL;DR

This work addresses catastrophic forgetting during parameter-efficient fine-tuning (PEFT) of large language models by introducing IMSM, a model-agnostic framework that uses a siamese LLM to retain original world knowledge while incorporating task-specific updates. IMSM interweaves memories from the frozen pre-trained parameters and the PEFT-tuned parameters via a query-aware gate, enabling dynamic, token-level fusion during generation. Through extensive experiments on multiple open-source LLM backbones and benchmark datasets, IMSM achieves superior alignment performance while maintaining comparable efficiency to standard PEFT methods and effectively mitigating forgetting on non-target tasks. The approach demonstrates a practical path to balance plasticity and stability in LLM fine-tuning, with potential extensions to deeper memory fusion across layers.

Abstract

Parameter-efficient fine-tuning (PEFT) methods optimize large language models (LLMs) by modifying or introducing a small number of parameters to enhance alignment with downstream tasks. However, they can result in catastrophic forgetting, where LLMs prioritize new knowledge at the expense of comprehensive world knowledge. A promising approach to mitigate this issue is to recall prior memories based on the original knowledge. To this end, we propose a model-agnostic PEFT framework, IMSM, which Interweaves Memories of a Siamese Large Language Model. Specifically, our siamese LLM is equipped with an existing PEFT method. Given an incoming query, it generates two distinct memories based on the pre-trained and fine-tuned parameters. IMSM then incorporates an interweaving mechanism that regulates the contributions of both original and enhanced memories when generating the next token. This framework is theoretically applicable to all open-source LLMs and existing PEFT methods. We conduct extensive experiments across various benchmark datasets, evaluating the performance of popular open-source LLMs using the proposed IMSM, in comparison to both classical and leading PEFT methods. Our findings indicate that IMSM maintains comparable time and space efficiency to backbone PEFT methods while significantly improving performance and effectively mitigating catastrophic forgetting.

Paper Structure

This paper contains 29 sections, 8 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison between vanilla PEFT (a) and our proposed IMSM (b). Vanilla PEFT methods employ an LLM only once. The parameter distribution shift will cause the LLM to forget general knowledge. Instead, IMSM incorporates a siamese LLM, which can be regarded as two LLMs, sharing identical structure and pre-trained parameters. One remains frozen while the other is fine-tuned using an existing PEFT method. By flexibly recalling the memory of the original LLM, IMSM can improve fine-tuning performance and alleviate catastrophic forgetting.
  • Figure 2: The overall architecture of IMSM, including a siamese LLM and an interweaving mechanism. Given the same input tokens, our siamese LLM produces memories with distinct values, which correspond to the two last hidden states. The generation of the next token relies on the updated memory through an interweaving mechanism. Trainable parameters are marked in red.
  • Figure 3: Evaluation of catastrophic forgetting using Llama2-7B, fine-tuned on ROPES and GSM8K, and evaluated on general knowledge benchmarks.
  • Figure 4: Ablation test of catastrophic forgetting validation. ChatGLM3-6B is employed as the backbone LLM, fine-tuned on ROPES, and evaluated on general knowledge benchmarks.