ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

Jiaang Li; Quan Wang; Zhongnan Wang; Yongdong Zhang; Zhendong Mao

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

Jiaang Li, Quan Wang, Zhongnan Wang, Yongdong Zhang, Zhendong Mao

TL;DR

ELDER tackles the challenge of lifelong model editing by replacing discrete data-to-adapter mappings with a continuous data-adapter association learned via a router that mixes multiple LoRAs. The method uses a top-$k$ routing scheme to allocate adapters based on edit semantics, guided by a loss that aligns allocations with knowledge and a deferral mechanism that preserves the base model on non-edited inputs. Empirical results on GPT2-XL and LLaMA2-7B across ZsRE and CounterFact show that ELDER achieves stronger editing reliability and generalization to rephrasings while maintaining downstream task performance, and it scales with a fixed parameter budget. The approach delivers both robust lifelong editing and practical efficiency, offering a scalable alternative to discrete adapter mappings with strong real-world implications for updating factual knowledge in large language models.

Abstract

Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and result in a significant forgetting effect in lifelong editing scenarios, where sequential edits are conducted over time. Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update. However, these methods lack robustness to minor input variations due to the discrete mapping between data and parameters. To overcome this challenge, we propose ELDER, a novel approach to create a continuous association between data and adapters. ELDER integrates multiple LoRAs through a router network and is trained to establish a smooth data-adapter association, thereby enhancing the edit robustness and generalization of semantically equivalent inputs. To ensure inputs containing the same knowledge will be processed by the same LoRAs, we design a novel loss to guide the model link LoRA allocations with edit knowledge. Furthermore, we propose a deferral mechanism to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting, outperforming eight baselines while exhibiting strong scalability and preserving LLMs' general abilities on downstream tasks. Our code is available at https://github.com/JiaangL/ELDER.

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

TL;DR

routing scheme to allocate adapters based on edit semantics, guided by a loss that aligns allocations with knowledge and a deferral mechanism that preserves the base model on non-edited inputs. Empirical results on GPT2-XL and LLaMA2-7B across ZsRE and CounterFact show that ELDER achieves stronger editing reliability and generalization to rephrasings while maintaining downstream task performance, and it scales with a fixed parameter budget. The approach delivers both robust lifelong editing and practical efficiency, offering a scalable alternative to discrete adapter mappings with strong real-world implications for updating factual knowledge in large language models.

Abstract

Paper Structure (33 sections, 9 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 9 equations, 4 figures, 10 tables, 1 algorithm.

Introduction
Related Works
Model Editing
Mixture-of-Experts (MoE)
Mixture-of-LoRAs
Proposed Methods
Problem Formulation
Mixture-of-LoRA Structure
Guided Loss
Deferral Mechanism
Experiments
Experimental Setup
Baselines
Metrics
Reliability
...and 18 more sections

Figures (4)

Figure 1: An illustration of processing two different edits with the mixture-of-LoRA module in ELDER. A mixture-of-LoRA module is applied to the FC layer at the FFN of the Transformer block. Each edit is routed to top-$k$ LoRAs with the highest scores based on its query vector. This figure takes $k=1$ as an example. The final results are summations of LoRA outputs and outputs of the original FC. Dotted lines denote multiplying LoRA outputs with corresponding weights. Training loss and deferral mechanism are omitted in this figure for simplicity.
Figure 2: Editing scalability of GRACE and ELDER.
Figure 3: ELDER editing performance after varying numbers of edits with different parameter budgets.
Figure 4: Visualization of LoRA allocation codes. Edit #1 to #5 denote five groups of semantically equivalent inputs, i.e., edits and their rephrases.

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

TL;DR

Abstract

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

Authors

TL;DR

Abstract

Table of Contents

Figures (4)