Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?
David Amebley, Sayanton Dibbo
TL;DR
This work addresses privacy leakage in multi-modal vision-language models under membership inference attacks in a strict black-box setting. It introduces a neuroscience-inspired topographic regularization, formalized as the objective $\mathcal{J}_{\tau} = \mathcal{J}_{\text{cap}} + \tau\,\mathcal{R}_{\text{topo}}$, and applies it to three VLMs (BLIP, PaliGemma 2, ViT-GPT2) across COCO, CC3M, and NoCaps. Through comprehensive experiments, the authors show that increasing $\tau$ reduces MIA success (ROC-AUC) while preserving captioning utility (MPNet/ROUGE-2) across models and datasets, though the magnitude of privacy gains is dataset-dependent. Ablation studies reveal that higher granularity in the attack amplifies leakage in baseline models but is mitigated by neuro-inspired regularization, highlighting a robust privacy-utility trade-off enabled by $\tau$-regularization. These findings suggest a practical pathway to privacy-preserving neuro-inspired VLMs for agentic AI applications, with avenues for further exploration of white-box MIAs and deployment-ready defenses.
Abstract
In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e., membership inference attack (MIA) on multi-modal vision-language models (VLMs). State-of-the-art research analyzes privacy attacks primarily to unimodal AI-ML systems, while recent studies indicate MMs can also be vulnerable to privacy attacks. While researchers have demonstrated that biologically inspired neural network representations can improve unimodal model resilience against adversarial attacks, it remains unexplored whether neuro-inspired MMs are resilient against privacy attacks. In this work, we introduce a systematic neuroscience-inspired topological regularization (tau) framework to analyze MM VLMs resilience against image-text-based inference privacy attacks. We examine this phenomenon using three VLMs: BLIP, PaliGemma 2, and ViT-GPT2, across three benchmark datasets: COCO, CC3M, and NoCaps. Our experiments compare the resilience of baseline and neuro VLMs (with topological regularization), where the tau > 0 configuration defines the NEURO variant of VLM. Our results on the BLIP model using the COCO dataset illustrate that MIA attack success in NEURO VLMs drops by 24% mean ROC-AUC, while achieving similar model utility (similarities between generated and reference captions) in terms of MPNet and ROUGE-2 metrics. This shows neuro VLMs are comparatively more resilient against privacy attacks, while not significantly compromising model utility. Our extensive evaluation with PaliGemma 2 and ViT-GPT2 models, on two additional datasets: CC3M and NoCaps, further validates the consistency of the findings. This work contributes to the growing understanding of privacy risks in MMs and provides evidence on neuro VLMs privacy threat resilience.
