Table of Contents
Fetching ...

MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs

Yupu Gu, Rongzhe Wei, Andy Zhu, Pan Li

TL;DR

MoEEdit tackles knowledge editing in sparse Mixture-of-Experts LLMs by preventing routing drift through per-expert null-space projections and solving the resulting block-structured optimization with a scalable randomized BCD method. The approach delivers state-of-the-art editing efficacy, generalization, and routing stability across COUNTERFACT and ZsRE for large MoE models, with compute and memory efficiency that scale linearly with the per-expert hidden size $d_k$. By decoupling expert updates and constraining router perturbations, MoEEdit enables localized, robust edits in sparse architectures. This work provides a practical, scalable foundation for trustworthy knowledge editing in MoE-based LLMs and highlights routing stability as a key factor in editing sparse models.

Abstract

Knowledge editing (KE) enables precise modifications to factual content in large language models (LLMs). Existing KE methods are largely designed for dense architectures, limiting their applicability to the increasingly prevalent sparse Mixture-of-Experts (MoE) models that underpin modern scalable LLMs. Although MoEs offer strong efficiency and capacity scaling, naively adapting dense-model editors is both computationally costly and prone to routing distribution shifts that undermine stability and consistency. To address these challenges, we introduce MoEEdit, the first routing-stable framework for parameter-modifying knowledge editing in MoE LLMs. Our method reparameterizes expert updates via per-expert null-space projections that keep router inputs invariant and thereby suppress routing shifts. The resulting block-structured optimization is solved efficiently with a block coordinate descent (BCD) solver. Experiments show that MoEEdit attains state-of-the-art efficacy and generalization while preserving high specificity and routing stability, with superior compute and memory efficiency. These results establish a robust foundation for scalable, precise knowledge editing in sparse LLMs and underscore the importance of routing-stable interventions.

MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs

TL;DR

MoEEdit tackles knowledge editing in sparse Mixture-of-Experts LLMs by preventing routing drift through per-expert null-space projections and solving the resulting block-structured optimization with a scalable randomized BCD method. The approach delivers state-of-the-art editing efficacy, generalization, and routing stability across COUNTERFACT and ZsRE for large MoE models, with compute and memory efficiency that scale linearly with the per-expert hidden size . By decoupling expert updates and constraining router perturbations, MoEEdit enables localized, robust edits in sparse architectures. This work provides a practical, scalable foundation for trustworthy knowledge editing in MoE-based LLMs and highlights routing stability as a key factor in editing sparse models.

Abstract

Knowledge editing (KE) enables precise modifications to factual content in large language models (LLMs). Existing KE methods are largely designed for dense architectures, limiting their applicability to the increasingly prevalent sparse Mixture-of-Experts (MoE) models that underpin modern scalable LLMs. Although MoEs offer strong efficiency and capacity scaling, naively adapting dense-model editors is both computationally costly and prone to routing distribution shifts that undermine stability and consistency. To address these challenges, we introduce MoEEdit, the first routing-stable framework for parameter-modifying knowledge editing in MoE LLMs. Our method reparameterizes expert updates via per-expert null-space projections that keep router inputs invariant and thereby suppress routing shifts. The resulting block-structured optimization is solved efficiently with a block coordinate descent (BCD) solver. Experiments show that MoEEdit attains state-of-the-art efficacy and generalization while preserving high specificity and routing stability, with superior compute and memory efficiency. These results establish a robust foundation for scalable, precise knowledge editing in sparse LLMs and underscore the importance of routing-stable interventions.
Paper Structure (23 sections, 45 equations, 6 figures, 3 tables)

This paper contains 23 sections, 45 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of knowledge editing in Mixture-of-Experts (MoE) LLMs. (a) Challenges: MoE editing is hindered by routing distribution shift, high computational cost, and expert coupling. (b) Method: MoEEdit mitigates these issues using per-expert null-space projection to stabilize routing and a randomized block coordinate descent (BCD) solver for efficient expert updates.
  • Figure 2: Routing similarity (RS) before and after editing on the editing and preservation sets. MoEEdit achieves consistently high RS, demonstrating strong routing stability.
  • Figure 3: Comparison of solvers. (a) BCD achieves fast convergence with suitable $\lambda$. (b) BCD scales efficiently with the number of experts, while the closed-form solver quickly becomes infeasible.
  • Figure 4: Ablation on the number of passes.
  • Figure 5: Examples of ZsRE dataset
  • ...and 1 more figures