The Trade-Offs of Private Prediction
Laurens van der Maaten, Awni Hannun
TL;DR
This work tackles how to privatize predictions from machine learning models without leaking training data. It contrasts three private-training techniques (model sensitivity, loss perturbation, DP-SGD) with two private-prediction approaches (prediction sensitivity, subsample-and-aggregate), all analyzed under an ERM framework with differential privacy guarantees and an inference budget. Across MNIST and CIFAR-10 experiments, private training methods often outperform private prediction methods in practical privacy-utility regimes, though results depend on δ, ε, and B. The findings provide actionable guidance for practitioners on selecting private-prediction strategies and highlight potential improvements through refined privacy accounting methods.
Abstract
Machine learning models leak information about their training data every time they reveal a prediction. This is problematic when the training data needs to remain private. Private prediction methods limit how much information about the training data is leaked by each prediction. Private prediction can also be achieved using models that are trained by private training methods. In private prediction, both private training and private prediction methods exhibit trade-offs between privacy, privacy failure probability, amount of training data, and inference budget. Although these trade-offs are theoretically well-understood, they have hardly been studied empirically. This paper presents the first empirical study into the trade-offs of private prediction. Our study sheds light on which methods are best suited for which learning setting. Perhaps surprisingly, we find private training methods outperform private prediction methods in a wide range of private prediction settings.
