The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

Simone Caldarella; Massimiliano Mancini; Elisa Ricci; Rahaf Aljundi

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

Simone Caldarella, Massimiliano Mancini, Elisa Ricci, Rahaf Aljundi

TL;DR

This work investigates privacy leakage, specifically identity leakage, in open-source vision-language models trained on webdata. By probing five generative VLMs with 25,000 celeb images, a suite of prompts $P_0$–$P_4$, and background manipulations, the study shows that models leak names even when fine-tuned on anonymized data and that simple image anonymization (e.g., face blurring) is largely ineffective. Background context modestly modulates leakage but does not prevent it, and leakage correlates with celebrity fame and exposure in large training corpora, suggesting memorization of identity associations. The findings highlight urgent need for stronger privacy protections and ethical guidelines in deploying VLMs, beyond basic data sanitization or post-hoc prompt controls.

Abstract

Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains. However, these capabilities are built upon training on large amount of uncurated data crawled from the web. The latter may include sensitive information that VLMs could memorize and leak, raising significant privacy concerns. In this paper, we assess whether these vulnerabilities exist, focusing on identity leakage. Our study leads to three key findings: (i) VLMs leak identity information, even when the vision-language alignment and the fine-tuning use anonymized data; (ii) context has little influence on identity leakage; (iii) simple, widely used anonymization techniques, like blurring, are not sufficient to address the problem. These findings underscore the urgent need for robust privacy protection strategies when deploying VLMs. Ethical awareness and responsible development practices are essential to mitigate these risks.

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

TL;DR

–

, and background manipulations, the study shows that models leak names even when fine-tuned on anonymized data and that simple image anonymization (e.g., face blurring) is largely ineffective. Background context modestly modulates leakage but does not prevent it, and leakage correlates with celebrity fame and exposure in large training corpora, suggesting memorization of identity associations. The findings highlight urgent need for stronger privacy protections and ethical guidelines in deploying VLMs, beyond basic data sanitization or post-hoc prompt controls.

Abstract

Paper Structure (32 sections, 5 equations, 7 figures, 7 tables)

This paper contains 32 sections, 5 equations, 7 figures, 7 tables.

Introduction
Related Works
Vision Language Models
Privacy-Preserving AI
Memorization in Neural Networks
Background
Input Representations
Vision Embedding.
Text Embedding.
Architectures
Contrastive Vision Language Models.
Large Language Models.
Generative Vision Language Models.
Uncovering Privacy Leakages
Experimental setting
...and 17 more sections

Figures (7)

Figure 1: Differently from proprietary Vision Language Models (e.g., Copilot github_copilot), open source VLMs leak private information (i.e., names) even though their modalities have been aligned using anonymized datasets. This behavior may result from the enduring retention of previously memorized face-identity patterns during unimodal pretraining.
Figure 1: Screenshot taken from Copilot. Copilot prevent leakages by applying its custom "PrivacyBlur". The model seems aware of the blurring and reply accordingly. This may suggest an alignment towards safety compliant behavior.
Figure 2: Main components of generative VLMs. (a) Contrastive laugange and image pretraining. Many VLMs use CLIP-like vision encoders as initial/frozen vision module. (b) Decoder only language model pre-trained autoregressively for text generation. (c) Common alignment mechanism: Typically, an additional module is trained to translate the vision encoder’s output space to the text decoder’s input space. Despite alignment with anonymized data, previously seen personal information is retained.
Figure 2: Screenshot taken from Google Maps. Google maps apply a simple blurring to protect individual privacy. However, the efficacy may be questionable.
Figure 3: Example of the picture manipulation we evaluated. (a) Original picture. (b) Background replaced with a random landscape background. (c) Background replaced with white to force the model focus towards the subject. (d) Face blur to question its effectiveness in preventing leakages.
...and 2 more figures

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

TL;DR

Abstract

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)