Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models
Ziyi Tong, Feifei Sun, Le Minh Nguyen
TL;DR
This work investigates whether text-based membership inference attacks extend to large multimodal language models. Using four MLLMs and ID/OOD data with text-only and vision+text prompts, the study evaluates six logit-based MIA methods and reports that visual inputs tend to mask membership signals in out-of-distribution settings while offering only small changes in-distribution. The results highlight a model- and distribution-dependent landscape, where domain shifts and visual processing can overturn expected attack signals. The findings call for multimodal-aware privacy evaluation and the development of attack methods tailored to vision-language interactions.
Abstract
Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently, understanding training-data leakage in these systems is increasingly critical. Log-probability-based membership inference attacks (MIAs) have become a widely adopted approach for assessing data exposure in large language models (LLMs), yet their effect in MLLMs remains unclear. We present the first comprehensive evaluation of extending these text-based MIA methods to multimodal settings. Our experiments under vision-and-text (V+T) and text-only (T-only) conditions across the DeepSeek-VL and InternVL model families show that in in-distribution settings, logit-based MIAs perform comparably across configurations, with a slight V+T advantage. Conversely, in out-of-distribution settings, visual inputs act as regularizers, effectively masking membership signals.
