Table of Contents
Fetching ...

Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing

Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, Tao Qi

TL;DR

The paper tackles privacy risks in LVLMs by addressing black-box membership inference, a setting where internal signals are unavailable. It introduces knowledge-calibrated memory probing (KCMP), a framework that constructs semantically grounded mask-prediction tasks, calibrates them against prior knowledge with CLIP and LLM rationality checks, and evaluates model confidence via instruction-based prompts. KCMP demonstrates strong attack performance across four LVLMs and three benchmarks, approaching gray-box baselines as probing depth increases, and remains effective under closed-source API constraints and varying sampling temperatures. The work also provides a DAM-based benchmark and thorough ablations, offering a practical approach to auditing data exposure in deployed LVLMs and highlighting the trade-offs between task design, calibration, and query efficiency.

Abstract

Large vision-language models (LVLMs) derive their capabilities from extensive training on vast corpora of visual and textual data. Empowered by large-scale parameters, these models often exhibit strong memorization of their training data, rendering them susceptible to membership inference attacks (MIAs). Existing MIA methods for LVLMs typically operate under white- or gray-box assumptions, by extracting likelihood-based features for the suspected data samples based on the target LVLMs. However, mainstream LVLMs generally only expose generated outputs while concealing internal computational features during inference, limiting the applicability of these methods. In this work, we propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism. The core idea is to assess the model memorization of the private semantic information embedded within the suspected image data, which is unlikely to be inferred from general world knowledge alone. We conducted extensive experiments across four LVLMs and three datasets. Empirical results demonstrate that our method effectively identifies training data of LVLMs in a purely black-box setting and even achieves performance comparable to gray-box and white-box methods. Further analysis reveals the robustness of our method against potential adversarial manipulations, and the effectiveness of the methodology designs. Our code and data are available at https://github.com/spmede/KCMP.

Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing

TL;DR

The paper tackles privacy risks in LVLMs by addressing black-box membership inference, a setting where internal signals are unavailable. It introduces knowledge-calibrated memory probing (KCMP), a framework that constructs semantically grounded mask-prediction tasks, calibrates them against prior knowledge with CLIP and LLM rationality checks, and evaluates model confidence via instruction-based prompts. KCMP demonstrates strong attack performance across four LVLMs and three benchmarks, approaching gray-box baselines as probing depth increases, and remains effective under closed-source API constraints and varying sampling temperatures. The work also provides a DAM-based benchmark and thorough ablations, offering a practical approach to auditing data exposure in deployed LVLMs and highlighting the trade-offs between task design, calibration, and query efficiency.

Abstract

Large vision-language models (LVLMs) derive their capabilities from extensive training on vast corpora of visual and textual data. Empowered by large-scale parameters, these models often exhibit strong memorization of their training data, rendering them susceptible to membership inference attacks (MIAs). Existing MIA methods for LVLMs typically operate under white- or gray-box assumptions, by extracting likelihood-based features for the suspected data samples based on the target LVLMs. However, mainstream LVLMs generally only expose generated outputs while concealing internal computational features during inference, limiting the applicability of these methods. In this work, we propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism. The core idea is to assess the model memorization of the private semantic information embedded within the suspected image data, which is unlikely to be inferred from general world knowledge alone. We conducted extensive experiments across four LVLMs and three datasets. Empirical results demonstrate that our method effectively identifies training data of LVLMs in a purely black-box setting and even achieves performance comparable to gray-box and white-box methods. Further analysis reveals the robustness of our method against potential adversarial manipulations, and the effectiveness of the methodology designs. Our code and data are available at https://github.com/spmede/KCMP.

Paper Structure

This paper contains 32 sections, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Overview of the proposed Knowledge-Calibrated Memory Probing (KCMP) framework. (A) Semantic Mask Prediction Task Construction: salient objects are extracted and masked to create shape- and color-based probes with semantically confounding alternatives. (B) Prior Knowledge Calibration: each probe is filtered using CLIP-based object relevance and LLM-estimated rationality to discard tasks solvable through general knowledge along. (C) Model Confidence Evaluation: the target LVLM answers the retained probes, and its aggregated confidence scores are used for membership inference, identifying samples with abnormally high confidence as likely training members.
  • Figure 2: ROC curves comparing three attack methods—KCMP, MaxRényi-K%, and Image Infer—on three target models (LLaVA, LLaMA Adapter, MiniGPT-4) under different dataset sizes $K \in \{10, 20, 30, 40, 50\}$. Each subplot shows the TPR-FPR curve and corresponding AUC. As $K$ increases, KCMP demonstrates steadily improving performance and approaches MaxRényi-K%, a strong gray-box baseline. In contrast, Image Infer, as a black-box method, maintains significantly lower AUC across all settings. KCMP consistently outperforms Image Infer and achieves competitive results with MaxRényi-K% on multiple configurations.
  • Figure 3: (a) Distribution of recovery accuracy across three types of region-based probing questions: unseen regions (label = 0), ungrounded regions (label = 1, without grounding), and grounded regions with annotations (label = 1, with grounding). (b) AUC comparison across three target models and two filtering settings (w/o and w/ Filter) on the VL-MIA/DALL-E dataset with different KCMP strategies. The proposed filtering mechanism improves attack performance across all target models.
  • Figure 4: Ablation study on filtering, question type, and number of confidence-evaluation tasks.