Table of Contents
Fetching ...

High-Power Training Data Identification with Provable Statistical Guarantees

Zhenlong Liu, Hao Zeng, Weiran Huang, Hongxin Wei

TL;DR

This work introduces Provable Training Data Identification (PTDI), a distribution-free framework for identifying training data points with provable control over the false discovery rate ($\mathrm{FDR}$). PTDI leverages conformal p-values derived from a non-training calibration set, scales them by a data-usage proportion estimate, and applies the Benjamini–Hochberg procedure to obtain a data-dependent threshold, enabling strict $\mathrm{FDR}$ control and enhanced power. A key contribution is the subtraction estimator for $\pi_{\text{test}}$, which conservatively estimates the proportion of training data in the test set and improves power while maintaining guarantees; an additional adjusted-moment estimator further boosts performance when some confirmed members are known. Extensive experiments across LLMs and VLMs show PTDI consistently controls $\mathrm{FDR}$ below target levels and outperforms existing methods like KTD in several settings, demonstrating practical applicability in pre-training and fine-tuning scenarios with diverse scores and datasets.

Abstract

Identifying training data within large-scale models is critical for copyright litigation, privacy auditing, and ensuring fair evaluation. The conventional approaches treat it as a simple binary classification task without statistical guarantees. A recent approach is designed to control the false discovery rate (FDR), but its guarantees rely on strong, easily violated assumptions. In this paper, we introduce Provable Training Data Identification (PTDI), a rigorous method that identifies a set of training data with strict false discovery rate (FDR) control. Specifically, our method computes p-values for each data point using a set of known unseen data, and then constructs a conservative estimator for the data usage proportion of the test set, which allows us to scale these p-values. Our approach then selects the final set of training data by identifying all points whose scaled p-values fall below a data-dependent threshold. This entire procedure enables the discovery of training data with provable, strict FDR control and significantly boosted power. Extensive experiments across a wide range of models (LLMs and VLMs), and datasets demonstrate that PTDI strictly controls the FDR and achieves higher power.

High-Power Training Data Identification with Provable Statistical Guarantees

TL;DR

This work introduces Provable Training Data Identification (PTDI), a distribution-free framework for identifying training data points with provable control over the false discovery rate (). PTDI leverages conformal p-values derived from a non-training calibration set, scales them by a data-usage proportion estimate, and applies the Benjamini–Hochberg procedure to obtain a data-dependent threshold, enabling strict control and enhanced power. A key contribution is the subtraction estimator for , which conservatively estimates the proportion of training data in the test set and improves power while maintaining guarantees; an additional adjusted-moment estimator further boosts performance when some confirmed members are known. Extensive experiments across LLMs and VLMs show PTDI consistently controls below target levels and outperforms existing methods like KTD in several settings, demonstrating practical applicability in pre-training and fine-tuning scenarios with diverse scores and datasets.

Abstract

Identifying training data within large-scale models is critical for copyright litigation, privacy auditing, and ensuring fair evaluation. The conventional approaches treat it as a simple binary classification task without statistical guarantees. A recent approach is designed to control the false discovery rate (FDR), but its guarantees rely on strong, easily violated assumptions. In this paper, we introduce Provable Training Data Identification (PTDI), a rigorous method that identifies a set of training data with strict false discovery rate (FDR) control. Specifically, our method computes p-values for each data point using a set of known unseen data, and then constructs a conservative estimator for the data usage proportion of the test set, which allows us to scale these p-values. Our approach then selects the final set of training data by identifying all points whose scaled p-values fall below a data-dependent threshold. This entire procedure enables the discovery of training data with provable, strict FDR control and significantly boosted power. Extensive experiments across a wide range of models (LLMs and VLMs), and datasets demonstrate that PTDI strictly controls the FDR and achieves higher power.

Paper Structure

This paper contains 45 sections, 6 theorems, 50 equations, 9 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Let $\hat{\pi}_{\text{sub}}$ be the subtraction estimator defined above. Assuming the test data points are i.i.d. draws from the test distribution, the expectation of the ratio of the true non-member proportion to the estimated non-member proportion is bounded by 1. Formally,

Figures (9)

  • Figure 1: WikiMIA
  • Figure 2: ArXivTection
  • Figure 4: Comparison of FDR control between our method and KTD on three datasets.
  • Figure 5: Comparison of power on BBC Real Time with $\alpha = 0.1,0.15,0.2$.
  • Figure 6: FDR (solid lines) and power (bars) achieved by our method on MiniGPT-4 with the VL-MIA/Flickr dataset, evaluated across various data usage proportions of the test set $\pi_{\text{test}}$ and target FDR levels $\alpha$. All results are based on the MaxRényi-K% score calculated from three different input components: (a) the image embedding, (b) the generated description, and (c) the instruction combined with the description.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Proposition 1
  • Theorem 1
  • Proposition 2
  • Proposition 3
  • proof
  • Lemma 2: Classical FDR Control under PRDS benjamini2001control
  • Proposition 4: FDR Control for the BH Procedure on $p_j$
  • proof
  • proof : Proof of \ref{['theorem:control_fdr']}
  • proof