Table of Contents
Fetching ...

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models

Kunhao Li, Wenhao Li, Di Wu, Lei Yang, Jun Bai, Ju Jia, Jason Xue

TL;DR

This work tackles machine unlearning in multimodal large language models by introducing MIP-Editor, a path-aware, modality-specific editing framework. It locates influential neuron paths in both textual and visual branches via inter-layer gradient and Fisher integrations, prunes these paths, and applies Representation Misdirection Unlearning to steer forget-set representations away while preserving retain-set performance. Extensive experiments on representative MLLMs and benchmarks show state-of-the-art forgetting (up to 87.75%) with substantial retention gains (up to 54.26%) in multimodal tasks and strong textual forgetting with high retention, outperforming strong baselines. The results demonstrate that coordinating cross-modal forgetting through influential paths yields superior unlearning while maintaining generalization, offering a scalable approach to safe and compliant MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) extend foundation models to real-world applications by integrating inputs such as text and vision. However, their broad knowledge capacity raises growing concerns about privacy leakage, toxicity mitigation, and intellectual property violations. Machine Unlearning (MU) offers a practical solution by selectively forgetting targeted knowledge while preserving overall model utility. When applied to MLLMs, existing neuron-editing-based MU approaches face two fundamental challenges: (1) forgetting becomes inconsistent across modalities because existing point-wise attribution methods fail to capture the structured, layer-by-layer information flow that connects different modalities; and (2) general knowledge performance declines when sensitive neurons that also support important reasoning paths are pruned, as this disrupts the model's ability to generalize. To alleviate these limitations, we propose a multimodal influential neuron path editor (MIP-Editor) for MU. Our approach introduces modality-specific attribution scores to identify influential neuron paths responsible for encoding forget-set knowledge and applies influential-path-aware neuron-editing via representation misdirection. This strategy also enables effective and coordinated forgetting across modalities while preserving the model's general capabilities. Experimental results demonstrate that MIP-Editor achieves a superior unlearning performance on multimodal tasks, with a maximum forgetting rate of 87.75% and up to 54.26% improvement in general knowledge retention. On textual tasks, MIP-Editor achieves up to 80.65% forgetting and preserves 77.9% of general performance. Codes are available at https://github.com/PreckLi/MIP-Editor.

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models

TL;DR

This work tackles machine unlearning in multimodal large language models by introducing MIP-Editor, a path-aware, modality-specific editing framework. It locates influential neuron paths in both textual and visual branches via inter-layer gradient and Fisher integrations, prunes these paths, and applies Representation Misdirection Unlearning to steer forget-set representations away while preserving retain-set performance. Extensive experiments on representative MLLMs and benchmarks show state-of-the-art forgetting (up to 87.75%) with substantial retention gains (up to 54.26%) in multimodal tasks and strong textual forgetting with high retention, outperforming strong baselines. The results demonstrate that coordinating cross-modal forgetting through influential paths yields superior unlearning while maintaining generalization, offering a scalable approach to safe and compliant MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) extend foundation models to real-world applications by integrating inputs such as text and vision. However, their broad knowledge capacity raises growing concerns about privacy leakage, toxicity mitigation, and intellectual property violations. Machine Unlearning (MU) offers a practical solution by selectively forgetting targeted knowledge while preserving overall model utility. When applied to MLLMs, existing neuron-editing-based MU approaches face two fundamental challenges: (1) forgetting becomes inconsistent across modalities because existing point-wise attribution methods fail to capture the structured, layer-by-layer information flow that connects different modalities; and (2) general knowledge performance declines when sensitive neurons that also support important reasoning paths are pruned, as this disrupts the model's ability to generalize. To alleviate these limitations, we propose a multimodal influential neuron path editor (MIP-Editor) for MU. Our approach introduces modality-specific attribution scores to identify influential neuron paths responsible for encoding forget-set knowledge and applies influential-path-aware neuron-editing via representation misdirection. This strategy also enables effective and coordinated forgetting across modalities while preserving the model's general capabilities. Experimental results demonstrate that MIP-Editor achieves a superior unlearning performance on multimodal tasks, with a maximum forgetting rate of 87.75% and up to 54.26% improvement in general knowledge retention. On textual tasks, MIP-Editor achieves up to 80.65% forgetting and preserves 77.9% of general performance. Codes are available at https://github.com/PreckLi/MIP-Editor.

Paper Structure

This paper contains 44 sections, 28 equations, 10 figures, 4 tables, 2 algorithms.

Figures (10)

  • Figure 1: Comparison between existing MU methods and MIP-Editor. Prior methods suffer from: (1) insufficient forgetting in the text modality, as point-wise attribution fails to capture structured cross-layer information flow; and (2) disruption of influential reasoning paths due to pruning.
  • Figure 2: Overview of MIP-Editor. (1) Influential neuron paths are located using inter-layer gradient (text) and Fisher (vision) integration. (2) Neurons inside the selected paths are pruned, and (3) path-specific editing is performed via representation misdirection to achieve modality-consistent forgetting while preserving general knowledge.
  • Figure 3: The overall trade-off between unlearning effectiveness and model utility across four dimensions under varying forget ratios, using Qwen2.5-VL as the base model.
  • Figure 4: Performance comparison on generation tasks between influential neuron paths and point-wise influential neurons under varying top-$k$ neuron selections.
  • Figure 5: Relative MAE of predicted logit probabilities for ground-truth labels after pruning neurons selected by different methods.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1: Influential Paths