Table of Contents
Fetching ...

Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety

Younggun Kim, Sirnam Swetha, Fazil Kagdi, Mubarak Shah

TL;DR

This work addresses biometric privacy risks in multimodal large language models by introducing PRISM, a benchmark that jointly evaluates explicit refusals and implicit biometric leakage, and Safe-LLaVA, a privacy-preserving training dataset obtained by systematically removing biometric cues from LLaVA using GPT-4o-based cleaning. The authors perform a thorough audit of LLaVA data, reveal pervasive biometric leakage, and demonstrate that models fine-tuned on Safe-LLaVA exhibit near-perfect refusal to biometric prompts and substantially reduced leakage in open-ended responses, while preserving general multimodal capabilities. Quantitative results show Safe-LLaVA achieves high Refusal Accuracy $ACC^{j}_{Ref}$ and leakage protections $L^{j}_{attr}$ and $L_{sent}$ across attributes, with minimal to no degradation on non-biometric tasks; the approach is validated against multiple evaluators (e.g., GPT, Gemini). Collectively, Safe-LLaVA and PRISM establish a privacy-aligned paradigm for developing and evaluating vision-language systems, with practical implications for compliance and trust in real-world deployments.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language tasks. However, these models often infer and reveal sensitive biometric attributes such as race, gender, age, body weight, and eye color; even when such information is not explicitly requested. This raises critical concerns, particularly in real-world applications and socially-sensitive domains. Despite increasing awareness, no publicly available dataset or benchmark exists to comprehensively evaluate or mitigate biometric leakage in MLLMs. To address this gap, we introduce PRISM (Privacy-aware Evaluation of Responses in Sensitive Modalities), a new benchmark designed to assess MLLMs on two fronts: (1) refuse biometric-related queries and (2) implicit biometric leakage in general responses while maintaining semantic faithfulness. Further, we conduct a detailed audit of the widely used LLaVA datasets and uncover extensive biometric leakage across pretraining and instruction data. To address this, we present Safe-LLaVA dataset, the first privacy-preserving MLLM training dataset constructed by systematically removing explicit and implicit biometric information from LLaVA dataset. Our evaluations on PRISM reveal biometric leakages across MLLMs for different attributes, highlighting the detailed privacy-violations. We also fine-tune a model on Safe-LLaVA dataset and show that it substantially reduces the biometric leakages. Together, Safe-LLaVA and PRISM set a new standard for privacy-aligned development and evaluation of MLLMs.

Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety

TL;DR

This work addresses biometric privacy risks in multimodal large language models by introducing PRISM, a benchmark that jointly evaluates explicit refusals and implicit biometric leakage, and Safe-LLaVA, a privacy-preserving training dataset obtained by systematically removing biometric cues from LLaVA using GPT-4o-based cleaning. The authors perform a thorough audit of LLaVA data, reveal pervasive biometric leakage, and demonstrate that models fine-tuned on Safe-LLaVA exhibit near-perfect refusal to biometric prompts and substantially reduced leakage in open-ended responses, while preserving general multimodal capabilities. Quantitative results show Safe-LLaVA achieves high Refusal Accuracy and leakage protections and across attributes, with minimal to no degradation on non-biometric tasks; the approach is validated against multiple evaluators (e.g., GPT, Gemini). Collectively, Safe-LLaVA and PRISM establish a privacy-aligned paradigm for developing and evaluating vision-language systems, with practical implications for compliance and trust in real-world deployments.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language tasks. However, these models often infer and reveal sensitive biometric attributes such as race, gender, age, body weight, and eye color; even when such information is not explicitly requested. This raises critical concerns, particularly in real-world applications and socially-sensitive domains. Despite increasing awareness, no publicly available dataset or benchmark exists to comprehensively evaluate or mitigate biometric leakage in MLLMs. To address this gap, we introduce PRISM (Privacy-aware Evaluation of Responses in Sensitive Modalities), a new benchmark designed to assess MLLMs on two fronts: (1) refuse biometric-related queries and (2) implicit biometric leakage in general responses while maintaining semantic faithfulness. Further, we conduct a detailed audit of the widely used LLaVA datasets and uncover extensive biometric leakage across pretraining and instruction data. To address this, we present Safe-LLaVA dataset, the first privacy-preserving MLLM training dataset constructed by systematically removing explicit and implicit biometric information from LLaVA dataset. Our evaluations on PRISM reveal biometric leakages across MLLMs for different attributes, highlighting the detailed privacy-violations. We also fine-tune a model on Safe-LLaVA dataset and show that it substantially reduces the biometric leakages. Together, Safe-LLaVA and PRISM set a new standard for privacy-aligned development and evaluation of MLLMs.

Paper Structure

This paper contains 29 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: MLLMs reveal biometric information - such as race, eye color, age or gender - when prompted with both biometric-related and open-ended questions. Colors: race, age, gender, eye color
  • Figure 2: PRISM Dataset Curation Pipeline. For each biometric category, candidate images are collected through two complementary strategies: (1) web image search using carefully designed manual prompts with retrieval rules, and (2) filtering human images from existing multimodal benchmarks. Low-quality or duplicate images are removed through manual filtering. The curated images are labeled by category and paired with both biometric-related and open-ended questions to evaluate MLLMs biometric privacy.
  • Figure 3: PRISM Benchmark data distribution across attributes and sub-categories.
  • Figure 4: Overview of the Safe-LLaVA data cleaning pipeline. Original LLaVA dataset contains biometric information, to detect and filter such leakage, we apply GPT-4o to probe both explicit (questions) and implicit (answers) mentions of biometric attributes (e.g., gender, age, race). Using specific refusal and cleaning rules, we transform sensitive samples into privacy-safe versions.
  • Figure 5: Comparison of ground truth responses between LLaVA and Safe-LLaVA across different biometric categories. As shown, LLaVA dataset includes explicit mentions of sensitive attributes like gender, age, race, and weight. In contrast, Safe-LLaVA replaces or refuses such content to protect privacy while retaining the overall meaning of the response.
  • ...and 11 more figures