FiMMIA: scaling semantic perturbation-based membership inference across modalities

Anton Emelyanov; Sergei Kudriashov; Alena Fenogenova

FiMMIA: scaling semantic perturbation-based membership inference across modalities

Anton Emelyanov, Sergei Kudriashov, Alena Fenogenova

TL;DR

FiMMIA presents a modular, perturbation-based framework for multimodal membership inference attacks on LLMs, addressing data contamination and distribution shifts that undermine benchmark reliability. It extends semantic MIAs to image, video, audio, and text, training a detector on differences in losses and embeddings between original and perturbed inputs across multiple modalities. The approach demonstrates strong leakage detection across MERA and various MLLMs, with high AUC-ROC and transferable performance within model families, and shows promising cross-lingual applicability. The framework is language-agnostic, extensible to new datasets and modalities, and released as open-source, while acknowledging limitations around fine-tuning scope and reproducibility concerns in diverse environments.

Abstract

Membership Inference Attacks (MIAs) aim to determine whether a specific data point was included in the training set of a target model. Although there are have been numerous methods developed for detecting data contamination in large language models (LLMs), their performance on multimodal LLMs (MLLMs) falls short due to the instabilities introduced through multimodal component adaptation and possible distribution shifts across multiple inputs. In this work, we investigate multimodal membership inference and address two issues: first, by identifying distribution shifts in the existing datasets, and second, by releasing an extended baseline pipeline to detect them. We also generalize the perturbation-based membership inference methods to MLLMs and release \textbf{FiMMIA} -- a modular \textbf{F}ramework for \textbf{M}ultimodal \textbf{MIA}.\footnote{The source code and framework have been made publicly available under the MIT license via \href{https://github.com/ai-forever/data_leakage_detect}{link}.The video demonstration is available on \href{https://youtu.be/a9L4-H80aSg}{YouTube}.} Our approach trains a neural network to analyze the target model's behavior on perturbed inputs, capturing distributional differences between members and non-members. Comprehensive evaluations on various fine-tuned multimodal models demonstrate the effectiveness of our perturbation-based membership inference attacks in multimodal domains.

FiMMIA: scaling semantic perturbation-based membership inference across modalities

TL;DR

Abstract

FiMMIA: scaling semantic perturbation-based membership inference across modalities

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)