Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Jiayang Meng; Tao Huang; Chen Hou; Guolong Zheng; Hong Chen

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen

TL;DR

Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk, is introduced, and theoretical guarantees on both privacy and convergence of LM-DP-SGD are established.

Abstract

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

TL;DR

Abstract

Paper Structure (36 sections, 2 theorems, 32 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 36 sections, 2 theorems, 32 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Gradient Clipping Strategies in DP-SGD
DP-SGD Limitations for Layer-wise Privacy
Preliminaries
DP-SGD
Layer-wise Heterogeneity of MIA Risk
Proposed Method: LM-DP-SGD
Layer-wise MIA-Risk Estimation
Design of Layer-wise DP-SGD
Private Training via LM-DP-SGD
Theoretical Analysis
Privacy Guarantee
Convergence Properties
Experimental Setup
...and 21 more sections

Key Result

Theorem 5.1

Algorithm alg:LR-DP-SGD, which applies a layer-wise reweighting vector $w_t=[w_t^{(1)},...,w_t^{(L)}]$ constrained by $\sum_{l=1}^L (w_t^{(l)})^2 = 1$ and adds noise $\mathcal{N}(0, C^2 \sigma^2 I)$ to the aggregated reweighted-and-clipped gradients $\sum_{\mathbf{x}_i \in \mathcal{B}_t} \hat{G}_{t, ensures $(\varepsilon, \delta)$-differential privacy after $T$ iterations.

Figures (7)

Figure 1: An overview of LM-DP-SGD, which comprises two components: (i) Layer-wise MIA-risk estimation (Section \ref{['4.1']}), which trains layer-specific adversaries on a public shadow dataset to assess MIA risk of each layer; and (ii) Private training via LM-DP-SGD (Section \ref{['4.2']}), a differentially private optimization procedure that leverages these estimated risks to perform layer-wise reweighted clipping to per-example gradients before noise injection, allocating protection proportionally to layer vulnerability.
Figure 2: MIA accuracy using intermediate representations from different convolutional layers.
Figure 3: Evolution of test accuracy during training. The optimal layer-wise reweighting coefficients for convergence, $\{w_t^{(l)*}\}_{l=1}^L$, are derived via Lagrange multipliers (see Appendix \ref{['appendix lag']}). For clarity, all curves are smoothed using a Savitzky-Golay filter. The shaded regions denote performance within $2\%$ below the test accuracy of the baseline which achieves the maximum final accuracy. The results show that our method preserves model utility relative to the baselines. Importantly, this does not imply superior test accuracy; rather, performance remains comparable. This conclusion is supported by two observations. First, due to curve smoothing, minor deviations should be interpreted as smoothing artifacts rather than as meaningful performance differences; Second, once training converges, LM-DP-SGD’s performance consistently lies within the shaded regions. Overall, these results indicate that our method preserves utility.
Figure 4: Evolution of the $\ell_2$-norm of the bias term $b_t$, $\|b_t\|_2$. For visual clarity, all curves are smoothed using a Savitzky-Golay filter. We observe that the magnitude of $\|b_t\|_2$ remains within the same order across all methods throughout training. Compared to the baselines, our method does not introduce excessive bias; in fact, it yields a relatively lower gradient bias over the majority of training.
Figure 5: Impact of privacy budget $\varepsilon$.
...and 2 more figures

Theorems & Definitions (4)

Theorem 5.1: Privacy guarantee of LM-DP-SGD
Theorem 5.2: Convergence of LM-DP-SGD
proof
proof

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

TL;DR

Abstract

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)