Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models
Zheyuan Liu, Guangyao Dou, Xiangchi Yuan, Chunhui Zhang, Zhaoxuan Tan, Meng Jiang
TL;DR
The paper tackles privacy risks from memorization in multimodal large language models by proposing Modality Aware Neuron Unlearning (MANU), a two-stage framework that first identifies modality-specific neurons most linked to forget data and then prunes them. MANU employs four importance functions—absolute, frequency, variance, and RMS—to capture diverse activation patterns across text and vision modalities, and uses a forget-vs-retain scoring ratio to select pruning targets. Across LLaVA and Idefics2, MANU achieves strong, balanced unlearning across modalities while preserving utility on retained data and general benchmarks, outperforming several baselines. The work also provides ablations, discusses limitations, and outlines directions to extend modality-aware unlearning to broader applications and larger models.
Abstract
Generative models such as Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) trained on massive datasets can lead them to memorize and inadvertently reveal sensitive information, raising ethical and privacy concerns. While some prior works have explored this issue in the context of LLMs, it presents a unique challenge for MLLMs due to the entangled nature of knowledge across modalities, making comprehensive unlearning more difficult. To address this challenge, we propose Modality Aware Neuron Unlearning (MANU), a novel unlearning framework for MLLMs designed to selectively clip neurons based on their relative importance to the targeted forget data, curated for different modalities. Specifically, MANU consists of two stages: important neuron selection and selective pruning. The first stage identifies and collects the most influential neurons across modalities relative to the targeted forget knowledge, while the second stage is dedicated to pruning those selected neurons. MANU effectively isolates and removes the neurons that contribute most to the forget data within each modality, while preserving the integrity of retained knowledge. Our experiments conducted across various MLLM architectures illustrate that MANU can achieve a more balanced and comprehensive unlearning in each modality without largely affecting the overall model utility.
