The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
Simone Caldarella, Massimiliano Mancini, Elisa Ricci, Rahaf Aljundi
TL;DR
This work investigates privacy leakage, specifically identity leakage, in open-source vision-language models trained on webdata. By probing five generative VLMs with 25,000 celeb images, a suite of prompts $P_0$–$P_4$, and background manipulations, the study shows that models leak names even when fine-tuned on anonymized data and that simple image anonymization (e.g., face blurring) is largely ineffective. Background context modestly modulates leakage but does not prevent it, and leakage correlates with celebrity fame and exposure in large training corpora, suggesting memorization of identity associations. The findings highlight urgent need for stronger privacy protections and ethical guidelines in deploying VLMs, beyond basic data sanitization or post-hoc prompt controls.
Abstract
Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains. However, these capabilities are built upon training on large amount of uncurated data crawled from the web. The latter may include sensitive information that VLMs could memorize and leak, raising significant privacy concerns. In this paper, we assess whether these vulnerabilities exist, focusing on identity leakage. Our study leads to three key findings: (i) VLMs leak identity information, even when the vision-language alignment and the fine-tuning use anonymized data; (ii) context has little influence on identity leakage; (iii) simple, widely used anonymization techniques, like blurring, are not sufficient to address the problem. These findings underscore the urgent need for robust privacy protection strategies when deploying VLMs. Ethical awareness and responsible development practices are essential to mitigate these risks.
