Table of Contents
Fetching ...

Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?

David Amebley, Sayanton Dibbo

TL;DR

This work addresses privacy leakage in multi-modal vision-language models under membership inference attacks in a strict black-box setting. It introduces a neuroscience-inspired topographic regularization, formalized as the objective $\mathcal{J}_{\tau} = \mathcal{J}_{\text{cap}} + \tau\,\mathcal{R}_{\text{topo}}$, and applies it to three VLMs (BLIP, PaliGemma 2, ViT-GPT2) across COCO, CC3M, and NoCaps. Through comprehensive experiments, the authors show that increasing $\tau$ reduces MIA success (ROC-AUC) while preserving captioning utility (MPNet/ROUGE-2) across models and datasets, though the magnitude of privacy gains is dataset-dependent. Ablation studies reveal that higher granularity in the attack amplifies leakage in baseline models but is mitigated by neuro-inspired regularization, highlighting a robust privacy-utility trade-off enabled by $\tau$-regularization. These findings suggest a practical pathway to privacy-preserving neuro-inspired VLMs for agentic AI applications, with avenues for further exploration of white-box MIAs and deployment-ready defenses.

Abstract

In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e., membership inference attack (MIA) on multi-modal vision-language models (VLMs). State-of-the-art research analyzes privacy attacks primarily to unimodal AI-ML systems, while recent studies indicate MMs can also be vulnerable to privacy attacks. While researchers have demonstrated that biologically inspired neural network representations can improve unimodal model resilience against adversarial attacks, it remains unexplored whether neuro-inspired MMs are resilient against privacy attacks. In this work, we introduce a systematic neuroscience-inspired topological regularization (tau) framework to analyze MM VLMs resilience against image-text-based inference privacy attacks. We examine this phenomenon using three VLMs: BLIP, PaliGemma 2, and ViT-GPT2, across three benchmark datasets: COCO, CC3M, and NoCaps. Our experiments compare the resilience of baseline and neuro VLMs (with topological regularization), where the tau > 0 configuration defines the NEURO variant of VLM. Our results on the BLIP model using the COCO dataset illustrate that MIA attack success in NEURO VLMs drops by 24% mean ROC-AUC, while achieving similar model utility (similarities between generated and reference captions) in terms of MPNet and ROUGE-2 metrics. This shows neuro VLMs are comparatively more resilient against privacy attacks, while not significantly compromising model utility. Our extensive evaluation with PaliGemma 2 and ViT-GPT2 models, on two additional datasets: CC3M and NoCaps, further validates the consistency of the findings. This work contributes to the growing understanding of privacy risks in MMs and provides evidence on neuro VLMs privacy threat resilience.

Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?

TL;DR

This work addresses privacy leakage in multi-modal vision-language models under membership inference attacks in a strict black-box setting. It introduces a neuroscience-inspired topographic regularization, formalized as the objective , and applies it to three VLMs (BLIP, PaliGemma 2, ViT-GPT2) across COCO, CC3M, and NoCaps. Through comprehensive experiments, the authors show that increasing reduces MIA success (ROC-AUC) while preserving captioning utility (MPNet/ROUGE-2) across models and datasets, though the magnitude of privacy gains is dataset-dependent. Ablation studies reveal that higher granularity in the attack amplifies leakage in baseline models but is mitigated by neuro-inspired regularization, highlighting a robust privacy-utility trade-off enabled by -regularization. These findings suggest a practical pathway to privacy-preserving neuro-inspired VLMs for agentic AI applications, with avenues for further exploration of white-box MIAs and deployment-ready defenses.

Abstract

In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e., membership inference attack (MIA) on multi-modal vision-language models (VLMs). State-of-the-art research analyzes privacy attacks primarily to unimodal AI-ML systems, while recent studies indicate MMs can also be vulnerable to privacy attacks. While researchers have demonstrated that biologically inspired neural network representations can improve unimodal model resilience against adversarial attacks, it remains unexplored whether neuro-inspired MMs are resilient against privacy attacks. In this work, we introduce a systematic neuroscience-inspired topological regularization (tau) framework to analyze MM VLMs resilience against image-text-based inference privacy attacks. We examine this phenomenon using three VLMs: BLIP, PaliGemma 2, and ViT-GPT2, across three benchmark datasets: COCO, CC3M, and NoCaps. Our experiments compare the resilience of baseline and neuro VLMs (with topological regularization), where the tau > 0 configuration defines the NEURO variant of VLM. Our results on the BLIP model using the COCO dataset illustrate that MIA attack success in NEURO VLMs drops by 24% mean ROC-AUC, while achieving similar model utility (similarities between generated and reference captions) in terms of MPNet and ROUGE-2 metrics. This shows neuro VLMs are comparatively more resilient against privacy attacks, while not significantly compromising model utility. Our extensive evaluation with PaliGemma 2 and ViT-GPT2 models, on two additional datasets: CC3M and NoCaps, further validates the consistency of the findings. This work contributes to the growing understanding of privacy risks in MMs and provides evidence on neuro VLMs privacy threat resilience.

Paper Structure

This paper contains 33 sections, 15 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: In (a), we illustrate an overview of MIA attack on VLM, where an adversary infers whether a sample is part of training data by generating the model’s (VLMs) caption (output) from VLMs with queries. In (b), we present the MIA pipeline for neuro-inspired VLMs, in which the adversary fine-tunes pre-trained VLMs with topological regularization ($\tau$) and generates captions. Finally, the adversary decided the membership of a particular sample based on semantic (MPNet) and lexical (ROUGE-2) similarity measures between original and generated captions.
  • Figure 2: Example image-caption pairs from COCO (BLIP) showing how captions change under baseline ($\tau=0$) and neuro-inspired (topographically regularized, i.e., $\tau=3$) variant for member and non-member samples.
  • Figure 3: Plots showing attack success rates in ROC-AUC.
  • Figure 4: Performance comparisons among Baseline and the $\tau$-regularized neuroscience-inspired models (Neuro with $\tau = 2$ and Neuro++ with $\tau = 3$) in terms of similarity means (ROUGE-2) across multiple models (i.e., BLIP, PaliGemma 2, and ViT-GPT2) on three datasets— COCO, CC3M, and NoCaps.
  • Figure 5: Performance comparisons among Baseline and the $\tau$-regularized neuroscience-inspired models (Neuro with $\tau = 2$ and Neuro++ with $\tau = 3$) in terms of similarity means (MPNet) across multiple models (i.e., BLIP, PaliGemma 2, and ViT-GPT2) on three datasets— COCO, CC3M, and NoCaps.
  • ...and 1 more figures