Table of Contents
Fetching ...

Dissecting Distribution Inference

Anshuman Suri, Yifu Lu, Yanjin Chen, David Evans

TL;DR

This work advances distribution inference privacy by introducing a black-box KL-Divergence Attack that often outperforms prior white-box approaches, even when adversaries have limited access (e.g., label-only APIs). By training shadow models on candidate distributions and comparing prediction distributions via KL divergence, the attack yields substantial leakage across diverse datasets, challenging the effectiveness of common noise-based defenses. The study systematically analyzes how adversary knowledge (model architectures, feature extractors, and access modality) affects leakage and finds that leakage persists even with reduced information, though sharing representations and model capacity considerations matter. As a practical countermeasure, the authors propose simple data re-sampling defenses (under-/over-sampling, augmentation strategies) that can markedly reduce leakage at the cost of some task performance and fairness implications, while revealing fundamental trade-offs between generalization and privacy. Code to reproduce results is publicly available at the linked GitHub repository.

Abstract

A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference

Dissecting Distribution Inference

TL;DR

This work advances distribution inference privacy by introducing a black-box KL-Divergence Attack that often outperforms prior white-box approaches, even when adversaries have limited access (e.g., label-only APIs). By training shadow models on candidate distributions and comparing prediction distributions via KL divergence, the attack yields substantial leakage across diverse datasets, challenging the effectiveness of common noise-based defenses. The study systematically analyzes how adversary knowledge (model architectures, feature extractors, and access modality) affects leakage and finds that leakage persists even with reduced information, though sharing representations and model capacity considerations matter. As a practical countermeasure, the authors propose simple data re-sampling defenses (under-/over-sampling, augmentation strategies) that can markedly reduce leakage at the cost of some task performance and fairness implications, while revealing fundamental trade-offs between generalization and privacy. Code to reproduce results is publicly available at the linked GitHub repository.

Abstract

A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference
Paper Structure (21 sections, 7 equations, 7 figures, 10 tables)

This paper contains 21 sections, 7 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Distinguishing accuracy for different task-property pairs for CelebA with varying correlation, for KL Divergence Attack.
  • Figure 2: Distinguishing accuracy for for the KL Divergence Attack for RSNA Bone Age (Sex), when the adversary uses the same feature extractor as the victim, and when the victim does not use or share any pre-trained feature extractor. While there is an obvious drop in performance, inference risk still stays high.
  • Figure 3: Comparing the distinguishing accuracy for the KL Divergence Attack for CelebA (Sex), when the target model returns prediction confidence scores and when it returns only prediction labels. Performance drops most for certain ratios like 0.2 and 0.8, but remains high and roughly the same for more extreme ratios like 0.0, 0.1, and 1.0.
  • Figure 4: Distinguishing accuracy for different for Census19 (Sex), using KL Divergence Attack. Attack accuracy drops with stronger DP guarantees (decreasing privacy budget $\epsilon$).
  • Figure 5: Distinguishing accuracy for different for Census19 (Sex), for varying levels of label poisoning. Inference risk drops considerably with increasing levels of label poisoning, but is also followed with non-trivial drops in task accuracies.
  • ...and 2 more figures