Table of Contents
Fetching ...

Turning Black Box into White Box: Dataset Distillation Leaks

Huajie Chen, Tianqing Zhu, Yuchen Zhong, Yang Zhang, Shang Wang, Feng He, Lefeng Zhang, Jialiang Shen, Minghao Wang, Wanlei Zhou

TL;DR

The Information Revelation Attack is introduced against state-of-the-art distillation techniques and accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.

Abstract

Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.

Turning Black Box into White Box: Dataset Distillation Leaks

TL;DR

The Information Revelation Attack is introduced against state-of-the-art distillation techniques and accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.

Abstract

Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.
Paper Structure (22 sections, 6 theorems, 42 equations, 10 figures, 6 tables, 3 algorithms)

This paper contains 22 sections, 6 theorems, 42 equations, 10 figures, 6 tables, 3 algorithms.

Key Result

Theorem 3.1

Assume that a dataset $\mathcal{D}_2$ is a perturbed version of another dataset $\mathcal{D}_1$, i.e. where $\delta > 0$ is a constant. We describe the training process on $\mathcal{D}_2$ as where $\bm{\theta}_{i}(t)$ denotes the weights in model $i$ at $t$ step; $\eta(t)$ is the learning rate at $t$ step; $L_{\mathcal{D}_i}(\cdot)$ is the loss of the model trained on $\mathcal{D}_i$. For any $\

Figures (10)

  • Figure 1: The overview of IRA. The victim generates a publicly available synthetic dataset using the distillation algorithm with the victim model and the real dataset. However, with the synthetic dataset, the adversary can reveal the sensitive information by launching IRA.
  • Figure 2: Architecture Inference Stage. For the AIA, $A$ combines different distillation algorithms $\gamma_i$ with model architectures $f_j$ to synthesize $l$ synthetic datasets. Then, these synthetic datasets are respectively used to train $u \times v \times l$ local models to create the set of loss trajectories $\mathcal{T}$. $\mathcal{T}$ is then used to train the architecture attack model $A_A$ to predict which algorithm and architecture are used for the dataset distillation.
  • Figure 3: Membership Inference Stage. In the MIA, $A$ trains the membership attack model $A_M$ with the outputs from each layer of the local model $h$ to determine the membership of an arbitrary given sample $x$.
  • Figure 4: Model Inversion Stage. The framework employs two models $\phi$ and $\psi$. $\phi$ is trained to predict the noise $\epsilon$ in the DDPM way, whereas $\psi$ is trained to output the clean image $x_0$ and the coefficient $r_t$ with the denoised image $x^*_0$ as the input. The final output of this dual network framework is the combination of their outputs weighted using $r_t$. Notably, except for the first term, the rest of the terms in the loss funcion are used to regularize $\psi$.
  • Figure 5: MIV Qualitative Results on CIFAR-10. The upper row are the samples in each class from the real dataset. The bottom row are the synthetic samples generated by the diffusion model.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Corollary 3.1
  • Lemma A.1
  • Lemma A.2
  • Theorem A.1
  • Corollary A.1