Table of Contents
Fetching ...

Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models

Zheyuan Liu, Guangyao Dou, Xiangchi Yuan, Chunhui Zhang, Zhaoxuan Tan, Meng Jiang

TL;DR

The paper tackles privacy risks from memorization in multimodal large language models by proposing Modality Aware Neuron Unlearning (MANU), a two-stage framework that first identifies modality-specific neurons most linked to forget data and then prunes them. MANU employs four importance functions—absolute, frequency, variance, and RMS—to capture diverse activation patterns across text and vision modalities, and uses a forget-vs-retain scoring ratio to select pruning targets. Across LLaVA and Idefics2, MANU achieves strong, balanced unlearning across modalities while preserving utility on retained data and general benchmarks, outperforming several baselines. The work also provides ablations, discusses limitations, and outlines directions to extend modality-aware unlearning to broader applications and larger models.

Abstract

Generative models such as Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) trained on massive datasets can lead them to memorize and inadvertently reveal sensitive information, raising ethical and privacy concerns. While some prior works have explored this issue in the context of LLMs, it presents a unique challenge for MLLMs due to the entangled nature of knowledge across modalities, making comprehensive unlearning more difficult. To address this challenge, we propose Modality Aware Neuron Unlearning (MANU), a novel unlearning framework for MLLMs designed to selectively clip neurons based on their relative importance to the targeted forget data, curated for different modalities. Specifically, MANU consists of two stages: important neuron selection and selective pruning. The first stage identifies and collects the most influential neurons across modalities relative to the targeted forget knowledge, while the second stage is dedicated to pruning those selected neurons. MANU effectively isolates and removes the neurons that contribute most to the forget data within each modality, while preserving the integrity of retained knowledge. Our experiments conducted across various MLLM architectures illustrate that MANU can achieve a more balanced and comprehensive unlearning in each modality without largely affecting the overall model utility.

Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models

TL;DR

The paper tackles privacy risks from memorization in multimodal large language models by proposing Modality Aware Neuron Unlearning (MANU), a two-stage framework that first identifies modality-specific neurons most linked to forget data and then prunes them. MANU employs four importance functions—absolute, frequency, variance, and RMS—to capture diverse activation patterns across text and vision modalities, and uses a forget-vs-retain scoring ratio to select pruning targets. Across LLaVA and Idefics2, MANU achieves strong, balanced unlearning across modalities while preserving utility on retained data and general benchmarks, outperforming several baselines. The work also provides ablations, discusses limitations, and outlines directions to extend modality-aware unlearning to broader applications and larger models.

Abstract

Generative models such as Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) trained on massive datasets can lead them to memorize and inadvertently reveal sensitive information, raising ethical and privacy concerns. While some prior works have explored this issue in the context of LLMs, it presents a unique challenge for MLLMs due to the entangled nature of knowledge across modalities, making comprehensive unlearning more difficult. To address this challenge, we propose Modality Aware Neuron Unlearning (MANU), a novel unlearning framework for MLLMs designed to selectively clip neurons based on their relative importance to the targeted forget data, curated for different modalities. Specifically, MANU consists of two stages: important neuron selection and selective pruning. The first stage identifies and collects the most influential neurons across modalities relative to the targeted forget knowledge, while the second stage is dedicated to pruning those selected neurons. MANU effectively isolates and removes the neurons that contribute most to the forget data within each modality, while preserving the integrity of retained knowledge. Our experiments conducted across various MLLM architectures illustrate that MANU can achieve a more balanced and comprehensive unlearning in each modality without largely affecting the overall model utility.

Paper Structure

This paper contains 53 sections, 14 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Comparison of MANU with the previous approach in responding to questions related to unlearned targets, using multimodal inputs (i.e., images with associated text) and pure text inputs, respectively.
  • Figure 2: Visualization of knowledge retention across MLLM language module layers for different unlearning methods on the forget/retain sets of MLLMU-Bench. Figures \ref{['fig:text-only-residual-forget']}, \ref{['fig:text-only-residual-retain']} show text-only residuals, while Figures \ref{['fig:multimodal-residual-forget']}, \ref{['fig:multimodal-residual-retain']} depict multimodal residuals. The $x$-axis represents unlearning methods (Grad. Diff. as GD), the $y$-axis shows layer indices, and darker red indicates higher knowledge retention.
  • Figure 3: The overall framework of MANU. The forget and retain sets are first split into text-only and multimodal modalities. Neuron activations are then computed across modalities and datasets, followed by applying an importance and scoring function to evaluate activated neurons. Finally, the top $\alpha \%$ of neurons are pruned based on their scores.
  • Figure 4: Classification, generation, and cloze performance of MANU and baselines in multimodal and unimodal setups with 5% forget data, using LLaVA as the base model. In subplots (a), (b), (e), (f), (i), and (j), the $y$-axis represents the change in classification accuracy, ROUGE-L score, and cloze accuracy relative to the vanilla model, evaluated on the Forget and Test sets. In the remaining subplots, the $y$-axis indicates classification accuracy, ROUGE-L score, and cloze accuracy, respectively. The $x$-axis represents performance across different modalities.
  • Figure 5: The overall trade-off between unlearning effectiveness and model utility across all baselines using different forget data, with LLaVA as the base model. The $x$-axis shows the difference in forget classification accuracy relative to the vanilla model, while the $y$-axis reflects model utility from various perspectives. From left to right, these perspectives include retain accuracy, real celebrity accuracy, MMMU, and LLaVA-Bench performance, respectively.
  • ...and 7 more figures