Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, Meng Jiang
TL;DR
This work addresses privacy risks in multimodal large language models by introducing MLLMU-Bench, a dedicated benchmark to study multimodal unlearning. The dataset combines 500 fictitious profiles and 153 real celebrities across four sets (Forget, Test, Retain, Real Celebrity) with 20k+ image+text and text-only questions, enabling evaluation of unlearning efficacy, generalizability, and model utility under 5%, 10%, and 15% forget scenarios. Baseline investigations across two base MLLMs reveal modality-dependent patterns: unimodal unlearning tends to excel in generation and cloze tasks, while multimodal unlearning better supports classification with multimodal inputs; there is a notable trade-off between forgetting effectiveness and overall model utility. The findings underscore the need for sophisticated multimodal unlearning strategies and provide a framework for future work on privacy-preserving mechanisms and potential certified/unlearning guarantees in MLLMs.
Abstract
Generative models such as Large Language Models (LLM) and Multimodal Large Language models (MLLMs) trained on massive web corpora can memorize and disclose individuals' confidential and private data, raising legal and ethical concerns. While many previous works have addressed this issue in LLM via machine unlearning, it remains largely unexplored for MLLMs. To tackle this challenge, we introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles and 153 profiles for public celebrities, each profile feature over 14 customized question-answer pairs, evaluated from both multimodal (image+text) and unimodal (text) perspectives. The benchmark is divided into four sets to assess unlearning algorithms in terms of efficacy, generalizability, and model utility. Finally, we provide baseline results using existing generative model unlearning algorithms. Surprisingly, our experiments show that unimodal unlearning algorithms excel in generation and cloze tasks, while multimodal unlearning approaches perform better in classification tasks with multimodal inputs.
