The Fundamental Limits of Least-Privilege Learning
Theresa Stadler, Bogdan Kulynych, Michael C. Gastpar, Nicolas Papernot, Carmela Troncoso
TL;DR
The paper formalizes the least-privilege principle (LPP) for machine learning in MLaaS settings by bounding the maximal leakage of any sensitive attribute $S$ given the task-relevant representation $Z$ and the task label $Y$ through $I_\infty(S; Z \mid Y) \le \gamma$. It establishes a fundamental trade-off: as a representation's utility for predicting the target $Y$ (measured by $I_\alpha(Y; Z)$ with $\alpha \in {1,\infty}$) grows, there exists some $S \neq Y$ that can be inferred from $Z$ with risk not smaller than $\gamma$, under realistic assumptions such as strictly positive $P_{Y|X}$. The authors formalize LPP, contrast unconditional LPP with the conditional LPP, relate LPP to MNI/CEB and LDP, and prove imcompatibilities that imply LPP cannot, in general, outperform differential privacy in maintaining low leakage for all attributes. Through extensive empirical evaluation across image and tabular datasets, architectures, and learning methods, they demonstrate the persistent leakage (the “whack-a-mole” effect) and fundamental leakage risks, including leakage arising from task labels themselves. The results suggest that LPP, while conceptually appealing, does not by itself guarantee harmless representations in MLaaS and should be assessed alongside, or in combination with, other privacy-preserving approaches and contextual integrity considerations.
Abstract
The promise of least-privilege learning -- to find feature representations that are useful for a learning task but prevent inference of any sensitive information unrelated to this task -- is highly appealing. However, so far this concept has only been stated informally. It thus remains an open question whether and how we can achieve this goal. In this work, we provide the first formalisation of the least-privilege principle for machine learning and characterise its feasibility. We prove that there is a fundamental trade-off between a representation's utility for a given task and its leakage beyond the intended task: it is not possible to learn representations that have high utility for the intended task but, at the same time prevent inference of any attribute other than the task label itself. This trade-off holds under realistic assumptions on the data distribution and regardless of the technique used to learn the feature mappings that produce these representations. We empirically validate this result for a wide range of learning techniques, model architectures, and datasets.
