Understanding Data Reconstruction Leakage in Federated Learning from a Theoretical Perspective
Zifan Wang, Binghui Zhang, Meng Pang, Yuan Hong, Binghui Wang
TL;DR
This work introduces a theoretical framework to bound data reconstruction leakage in federated learning by linking attack effectiveness to the Lipschitz constant of reconstruction functions. It derives reconstruction-error bounds for FedAvg under both full and partial device participation, enabling intrinsic comparison across optimization-based DRAs. The framework is instantiated and validated on MNIST, Fashion-MNIST, and CIFAR-10 using convex FL losses, showing that attacks like InvGrad and iDLG can have inherently tighter bounds than DLG, and that GGL achieves the smallest bounds by leveraging a learned data manifold. Practically, the results provide a tool for assessing attack strength and guiding privacy defenses, with future work aimed at non-convex losses and provable defenses.
Abstract
Federated learning (FL) is an emerging collaborative learning paradigm that aims to protect data privacy. Unfortunately, recent works show FL algorithms are vulnerable to the serious data reconstruction attacks. However, existing works lack a theoretical foundation on to what extent the devices' data can be reconstructed and the effectiveness of these attacks cannot be compared fairly due to their unstable performance. To address this deficiency, we propose a theoretical framework to understand data reconstruction attacks to FL. Our framework involves bounding the data reconstruction error and an attack's error bound reflects its inherent attack effectiveness. Under the framework, we can theoretically compare the effectiveness of existing attacks. For instance, our results on multiple datasets validate that the iDLG attack inherently outperforms the DLG attack.
