Understanding Deep Gradient Leakage via Inversion Influence Functions

Haobo Zhang; Junyuan Hong; Yuyang Deng; Mehrdad Mahdavi; Jiayu Zhou

Understanding Deep Gradient Leakage via Inversion Influence Functions

Haobo Zhang, Junyuan Hong, Yuyang Deng, Mehrdad Mahdavi, Jiayu Zhou

TL;DR

This work tackles the privacy risk posed by Deep Gradient Leakage (DGL) in distributed learning and introduces Inversion Influence Function (I$^2$F) to analyze it in a model-agnostic way. I$^2$F provides a closed-form link between a recovered image and the private gradient, via ${\partial G_r(g_0)}/{\partial g_0} = (JJ^\top)^{-1}J$, enabling efficient risk estimation with Jacobian-vector products and a tractable bound ${\mathcal I}_{lb}(\boldsymbol{\delta}; x_0) = {\|J\boldsymbol{\delta}\|}/{\lambda_{\max}(JJ^\top)}$. Empirical validation across vision, language, and large-scale settings shows I$^2$F accurately tracks privacy leakage trends, reveals that perturbation directions matter (leakage is larger along small Jacobian eigenvalues), and highlights sample- and initialization-dependent unfairness in protection. This toolkit enables fast privacy auditing and guides the design of more nuanced defenses, with potential applicability to large foundation models and extended attack-defense analyses.

Abstract

Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors. This attack casts significant privacy challenges on distributed learning from clients with sensitive data, where clients are required to share gradients. Defending against such attacks requires but lacks an understanding of when and how privacy leakage happens, mostly because of the black-box nature of deep networks. In this paper, we propose a novel Inversion Influence Function (I$^2$F) that establishes a closed-form connection between the recovered images and the private gradients by implicitly solving the DGL problem. Compared to directly solving DGL, I$^2$F is scalable for analyzing deep networks, requiring only oracle access to gradients and Jacobian-vector products. We empirically demonstrate that I$^2$F effectively approximated the DGL generally on different model architectures, datasets, modalities, attack implementations, and perturbation-based defenses. With this novel tool, we provide insights into effective gradient perturbation directions, the unfairness of privacy protection, and privacy-preferred model initialization. Our codes are provided in https://github.com/illidanlab/inversion-influence-function.

Understanding Deep Gradient Leakage via Inversion Influence Functions

TL;DR

This work tackles the privacy risk posed by Deep Gradient Leakage (DGL) in distributed learning and introduces Inversion Influence Function (I

F) to analyze it in a model-agnostic way. I

F provides a closed-form link between a recovered image and the private gradient, via

, enabling efficient risk estimation with Jacobian-vector products and a tractable bound

. Empirical validation across vision, language, and large-scale settings shows I

F accurately tracks privacy leakage trends, reveals that perturbation directions matter (leakage is larger along small Jacobian eigenvalues), and highlights sample- and initialization-dependent unfairness in protection. This toolkit enables fast privacy auditing and guides the design of more nuanced defenses, with potential applicability to large foundation models and extended attack-defense analyses.

Abstract

F) that establishes a closed-form connection between the recovered images and the private gradients by implicitly solving the DGL problem. Compared to directly solving DGL, I

F is scalable for analyzing deep networks, requiring only oracle access to gradients and Jacobian-vector products. We empirically demonstrate that I

F effectively approximated the DGL generally on different model architectures, datasets, modalities, attack implementations, and perturbation-based defenses. With this novel tool, we provide insights into effective gradient perturbation directions, the unfairness of privacy protection, and privacy-preferred model initialization. Our codes are provided in https://github.com/illidanlab/inversion-influence-function.

Paper Structure (33 sections, 5 theorems, 30 equations, 19 figures, 1 table)

This paper contains 33 sections, 5 theorems, 30 equations, 19 figures, 1 table.

Introduction
Related Work
Inversion Influence Function
Perturbing the Private Gradient
Theoretic Validation
Empirical Validation and Extensions
When Does Privacy Leakage Happen?
Perturbations Directions Are Not Equivalent
Privacy Protection Could Be Unfair
Model Initialization Matters
Conclusion and Discussion
Acknowledgments
Method
Other Efficient Evaluation Techniques
Proofs
...and 18 more sections

Key Result

Theorem 3.1

If ass:lip_jacobian and ass:smooth hold, then the recovery error satisfies: where $J=\nabla_x \nabla_\theta L(x_0, \theta)$.

Figures (19)

Figure 1: Value comparisons attacking ResNet18 on MNIST by DGL, where the grey line indicates the equal values and darker dots imply smaller Gaussian perturbation $\delta$. In (a), the y-axis is calculated as defined in \ref{['eq:I2F']} and $\mathcal{I}_{lb}$ is calculated as defined in \ref{['eq:up_lw_bounds']}. I$^2$F lower bound (${\mathcal{I}}_{\text{lb}}$) provides a good approximation to the exact value with matrix inversion and to the root of mean square error (RMSE) of recovered images. Instead, removing the denominator in ${\mathcal{I}}$ results in overestimated risks.
Figure 2: I$^2$F works under different settings: datasets, attacks, and models. The grey line indicates the equal values, and darker dots imply smaller Gaussian perturbation $\delta$.
Figure 3: Evaluation of I$^2$F on ResNet152 and ImageNet. (a): Darker color means larger noise variance. LPIPS is used to evaluate the semantic information of the recovered and original images. $I_{lb}$ is a good estimator of the semantic distance between the recovered images and original images. (b): Original (top) and recovered (bottom) images with their corresponding I$^2$F and LPIPS. Images with a lower I$^2$F also have a smaller LPIPS, which implies a better reconstruction.
Figure 4: Evaluation of $\mathcal{I}_{lb}$ on BERT (a-b) and GPT-2 (c-d). A darker color means a larger noise variance. ROUGE-L and Google BLEU are used to evaluate the semantic similarity between the original text and the recovered text. $\mathcal{I}_{lb}$ is linearly correlated to the two semantic metrics, which means $\mathcal{I}_{lb}$ can be used to estimate the privacy risk of the private text.
Figure 5: The distribution of eigenvalues of $JJ^\top$ of two datasets on the LeNet model.
...and 14 more figures

Theorems & Definitions (9)

Theorem 3.1
Lemma B.1
proof
Lemma B.2
proof
Lemma B.3
proof
Theorem B.1: Restated from \ref{['thm:cert_defense']}
proof

Understanding Deep Gradient Leakage via Inversion Influence Functions

TL;DR

Abstract

Understanding Deep Gradient Leakage via Inversion Influence Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (9)