Better Membership Inference Privacy Measurement through Discrepancy
Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri
TL;DR
This work introduces a discrepancy-distance based empirical privacy metric that upper-bounds the advantage of score-based Membership Inference Attacks, enabling scalable privacy assessment for large, well-generalized models without training multiple shadow models. It formalizes the bound via convex discriminative sets, proving Adv$(m; f, S, abla D)\le D_{ abla Q}(S, abla D)$ for common MIAs, and provides a practical approximation CPM (Convex Polytope Machine) using a polytope surrogate loss. Empirically, CPM consistently upper-bounds standard MIAs on CIFAR and ImageNet-scale models, with performance improving as the facet count $K$ grows, and revealing that traditional scores may overfit to standard training recipes. To address modern models trained with sophisticated procedures, the authors propose training-procedure aware MIAs, such as MixUp-score and RelaxLoss-score, which achieve higher leakage when aligned with the training method. Overall, the discrepancy-based metric offers a scalable, stronger privacy evaluation tool, while the MixUp and RelaxLoss scores illustrate the potential for procedure-aware MIAs on contemporary models.
Abstract
Membership Inference Attacks have emerged as a dominant method for empirically measuring privacy leakage from machine learning models. Here, privacy is measured by the {\em{advantage}} or gap between a score or a function computed on the training and the test data. A major barrier to the practical deployment of these attacks is that they do not scale to large well-generalized models -- either the advantage is relatively low, or the attack involves training multiple models which is highly compute-intensive. In this work, inspired by discrepancy theory, we propose a new empirical privacy metric that is an upper bound on the advantage of a family of membership inference attacks. We show that this metric does not involve training multiple models, can be applied to large Imagenet classification models in-the-wild, and has higher advantage than existing metrics on models trained with more recent and sophisticated training recipes. Motivated by our empirical results, we also propose new membership inference attacks tailored to these training losses.
