Table of Contents
Fetching ...

Privacy Side Channels in Machine Learning Systems

Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

TL;DR

The paper demonstrates that privacy guarantees for ML models can be fundamentally undermined when models are deployed as part of larger systems. It identifies four categories of privacy side channels spanning training data filtering, input/output processing, and query filtering, and shows concrete attacks that exploit these components to dramatically amplify information leakage beyond what isolated models expose. Key findings include that data deduplication can completely invalidate differential privacy guarantees, memorization filters can enable near-perfect membership inference and even data extraction of private keys, and stateful query detectors can leak other users’ test queries. The work evaluates these attacks on real systems (e.g., CIFAR-10 setups and GitHub Copilot scenarios) and argues for holistic, end-to-end privacy analyses of ML systems, as well as careful consideration of trade-offs between privacy, security, and robustness in system design.

Abstract

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum. Yet, in reality, these models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. We propose four categories of side channels that span the entire ML lifecycle (training data filtering, input preprocessing, output post-processing, and query filtering) and allow for enhanced membership inference, data extraction, and even novel threats such as extraction of users' test queries. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set--even if the model did not memorize these keys. Taken together, our results demonstrate the need for a holistic, end-to-end privacy analysis of machine learning systems.

Privacy Side Channels in Machine Learning Systems

TL;DR

The paper demonstrates that privacy guarantees for ML models can be fundamentally undermined when models are deployed as part of larger systems. It identifies four categories of privacy side channels spanning training data filtering, input/output processing, and query filtering, and shows concrete attacks that exploit these components to dramatically amplify information leakage beyond what isolated models expose. Key findings include that data deduplication can completely invalidate differential privacy guarantees, memorization filters can enable near-perfect membership inference and even data extraction of private keys, and stateful query detectors can leak other users’ test queries. The work evaluates these attacks on real systems (e.g., CIFAR-10 setups and GitHub Copilot scenarios) and argues for holistic, end-to-end privacy analyses of ML systems, as well as careful consideration of trade-offs between privacy, security, and robustness in system design.

Abstract

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum. Yet, in reality, these models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. We propose four categories of side channels that span the entire ML lifecycle (training data filtering, input preprocessing, output post-processing, and query filtering) and allow for enhanced membership inference, data extraction, and even novel threats such as extraction of users' test queries. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set--even if the model did not memorize these keys. Taken together, our results demonstrate the need for a holistic, end-to-end privacy analysis of machine learning systems.
Paper Structure (75 sections, 4 equations, 9 figures, 3 tables)

This paper contains 75 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 2: A depiction of our "hub-and-spokes" attack on data deduplication. Left: we insert poisoned examples that are each close to the "hub" ($x$) but are far from each other. Right: actual images from our attack. We also include a checkerboard backdoor in the top-left corner of each near-duplicate image to enhance memorization.
  • Figure 3: Deduplication can significantly worsen privacy. We show membership inference effectiveness under both exact deduplication (delete all) and no deduplication. With deduplication, the side-channel leads to near-perfect membership inference; without, it is similar to the baseline poisoning-aware Truth Serum attack tramer2022truthserum. The LiRA baseline carlini2022membership performs similarly in both cases.
  • Figure 4: Our attack is robust to uncertainty in the deduplication threshold $\alpha$. We create approximate duplicates by optimizing for the poisons to be less than $\alpha^2$-similar to each other, and more than $\alpha$-similar to the target. We show that, within a $\pm 2\%$ range, six duplicates are kept on average when the target sample is not a member--with only one sample being kept when the target sample is a member.
  • Figure 5: When federated learning is combined with defenses against data poisoning, a side-channel is opened that worsens privacy. We run the FoolsGold defense and insert a poisoning client into the learning protocol (denoted by the red vertical line). When a target client of interest is present in the data, both that client and the poisoning client will effectively not contribute to the learning (top of Figure). When the target is not present (bottom of Figure), the learning on the poisoning client continues as normal. This enables a strong membership inference attack.
  • Figure 6: We extract the tokenizer for GPT-2 on specific byte strings from Wikipedia. Our attack leverages a side channel based on the fact that language models use of a fixed context window.
  • ...and 4 more figures