Table of Contents
Fetching ...

Leave-one-out Distinguishability in Machine Learning

Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri

TL;DR

This work introduces LOOD, a framework that quantifies how adding or removing training data alters a model's output distribution, tying memorization, information leakage, and influence to a single diagnostic. It provides an analytic Gaussian Process-based method (NNGP-linked) to estimate LOOD efficiently, with closed-form posterior mean and covariance, and validates strong correlations with membership inference attacks while achieving large speedups over retraining. The authors also show that the optimal queries eliciting maximum leakage can be identified and that activation functions impact leakage via kernel rank, revealing a privacy–accuracy trade-off. The framework enables principled analysis of leakage across architectures and queries, with potential for data reconstruction and deeper privacy guidance in ML systems.

Abstract

We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This is key to measuring data **memorization** and information **leakage** as well as the **influence** of training data points in machine learning. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of leakage using membership inference attacks. Our analytical framework enables us to investigate the causes of leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to identify queries that disclose the most information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.

Leave-one-out Distinguishability in Machine Learning

TL;DR

This work introduces LOOD, a framework that quantifies how adding or removing training data alters a model's output distribution, tying memorization, information leakage, and influence to a single diagnostic. It provides an analytic Gaussian Process-based method (NNGP-linked) to estimate LOOD efficiently, with closed-form posterior mean and covariance, and validates strong correlations with membership inference attacks while achieving large speedups over retraining. The authors also show that the optimal queries eliciting maximum leakage can be identified and that activation functions impact leakage via kernel rank, revealing a privacy–accuracy trade-off. The framework enables principled analysis of leakage across architectures and queries, with potential for data reconstruction and deeper privacy guidance in ML systems.

Abstract

We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This is key to measuring data **memorization** and information **leakage** as well as the **influence** of training data points in machine learning. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of leakage using membership inference attacks. Our analytical framework enables us to investigate the causes of leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to identify queries that disclose the most information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.
Paper Structure (25 sections, 11 theorems, 46 equations, 16 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 11 theorems, 46 equations, 16 figures, 1 table, 1 algorithm.

Key Result

Proposition 4.1

Consider a dataset $D$, a differing point $S$, and a kernel function $K$ satisfying the same conditions of thm:master_theorem. Then, we have that where $M(Q)$ is the mean distance LOOD in def:mean_lood, $K_{SD}, K_{DS}, K_{DD}$ are kernel matrices as defined in sec:LOOD, $\mathring{K}_{SD} = \frac{\partial}{\partial Q} K_{QD} \mid_{Q=S}$, $M_D = K_{DD} + \sigma^2 I$, and $\alpha = 1 - K_{SD} M^{-

Figures (16)

  • Figure 1: Validation of our analytical framework (based on LOOD and mean distance LOOD for NNGP), according to information leakage measure (the performance of membership inference attacks) and influence definition (prediction difference under leave-one-out retraining).
  • Figure 2: Empirical validation of our analytical results in of \ref{['sec:effect_model_leakage']} that per-record information leakage is higher under GELU activation than under ReLU, for both NNGPs and NN models. We evaluate per-record leakage with membership inference attack performance on models trained on leave-one-out datasets. Dataset contains 'car' and 'plane' images from CIFAR-10. The NN model is fully connected network with depth 10 and width 1024.
  • Figure 3: Correlation between the LOOD and MIA success. The leave-one-out dataset $D$ is a class-balanced subset of CIFAR-10 with size 1000, and $D' = D\cup S$ for a randomly chosen differing record $S$. We evaluate over 200 random choices of the differing data record (sampled from the CIFAR-10 dataset or uniform distribution over pixel domain $[255]$.) We also evaluate under different LOOD metrics (mean distance and KL divergence) and different kernel functions (RBF kernel or NNGP kernel for fully connected network with depth 1). We observe that the correlation between KL and AUC (right figures) is consistently stronger and more consistent than the correlation between mean distance LOOD and auc (left figures).
  • Figure 4: An example on a one-dimensional toy sine training dataset generated as \ref{['app:mean_distance_toy_and_theory']}, where each training data is shown as a red dot. We denote $S^*$ as a crafted differing data record that incurs maximum gradient for the mean distance LOOD objective, which is constructed following the instructions in \ref{['alg:find_nonopt_S']}. In the left plot, we observe that the optimal query $Q^*$ for mean distance LOOD does not equal the differing point $S^*$. On the contrary, the optimal query for KL LOOD (which incurs maximal information leakage) is the differing point $S^*$ (right plot). This shows the discrepancy between mean distance LOOD and KL LOOD, and shows the inadequacy of using mean distance LOOD to capture information leakage.
  • Figure 5: Additional LOOD optimization results under NNGP kernels with different architectures. We evaluate on class-balanced subset of CIFAR-10 dataset of size 1000. The optimized query consistently has similar LOOD and MIA AUC score to the differing data record.
  • ...and 11 more figures

Theorems & Definitions (26)

  • Definition 2.1: LOOD
  • Proposition 4.1: First-order optimality condition for influence
  • Proposition 5.1: Informal: Low rank kernel matrix implies low LOOD
  • Definition C.1: Mean distance LOOD
  • Example C.2: Mean distance LOOD is not well-correlated with MIA success
  • Definition C.3: KL Divergence LOOD
  • Example C.4: KL divergence is better correlated with MIA success
  • Lemma D.1
  • proof
  • Proposition D.2: Isotropic kernels satisfy conditions in \ref{['thm:master_theorem']}
  • ...and 16 more