Understanding Deep Gradient Leakage via Inversion Influence Functions
Haobo Zhang, Junyuan Hong, Yuyang Deng, Mehrdad Mahdavi, Jiayu Zhou
TL;DR
This work tackles the privacy risk posed by Deep Gradient Leakage (DGL) in distributed learning and introduces Inversion Influence Function (I$^2$F) to analyze it in a model-agnostic way. I$^2$F provides a closed-form link between a recovered image and the private gradient, via ${\partial G_r(g_0)}/{\partial g_0} = (JJ^\top)^{-1}J$, enabling efficient risk estimation with Jacobian-vector products and a tractable bound ${\mathcal I}_{lb}(\boldsymbol{\delta}; x_0) = {\|J\boldsymbol{\delta}\|}/{\lambda_{\max}(JJ^\top)}$. Empirical validation across vision, language, and large-scale settings shows I$^2$F accurately tracks privacy leakage trends, reveals that perturbation directions matter (leakage is larger along small Jacobian eigenvalues), and highlights sample- and initialization-dependent unfairness in protection. This toolkit enables fast privacy auditing and guides the design of more nuanced defenses, with potential applicability to large foundation models and extended attack-defense analyses.
Abstract
Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors. This attack casts significant privacy challenges on distributed learning from clients with sensitive data, where clients are required to share gradients. Defending against such attacks requires but lacks an understanding of when and how privacy leakage happens, mostly because of the black-box nature of deep networks. In this paper, we propose a novel Inversion Influence Function (I$^2$F) that establishes a closed-form connection between the recovered images and the private gradients by implicitly solving the DGL problem. Compared to directly solving DGL, I$^2$F is scalable for analyzing deep networks, requiring only oracle access to gradients and Jacobian-vector products. We empirically demonstrate that I$^2$F effectively approximated the DGL generally on different model architectures, datasets, modalities, attack implementations, and perturbation-based defenses. With this novel tool, we provide insights into effective gradient perturbation directions, the unfairness of privacy protection, and privacy-preferred model initialization. Our codes are provided in https://github.com/illidanlab/inversion-influence-function.
