Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory
Xiaojin Zhang, Wei Chen
TL;DR
This work addresses privacy leakage in horizontal federated learning by developing two theoretical analyses. The linear-algebra view shows that a Jacobian of the batch data with rank smaller than the batch-data dimension ($\mathrm{rank}(\mathbf{J}) < Bp$) yields non-unique model updates for different batches, leading to a sufficient condition $d < Bp$ that prevents unique data reconstruction. The optimization-theory view defines privacy leakage as $\epsilon_p^k = 1 - \mathbb{E}[\frac{1}{|\mathcal{D}^k|} \sum_i \frac{1}{T} \sum_t \frac{\|\tilde{x}_{t,i}^k - x_i^k\|}{D}]$ with distortion $\Delta$, and derives an upper bound $\epsilon_p^k \le 1 + \sqrt{\frac{\ln 2 + \mathrm{poly}(B)}{2B}} - \frac{c_a}{2D} \Delta^k$ under bi-Lipschitz gradients and self-bounded regret. These results link privacy leakage to batch size and distortion, guiding privacy-preserving FL design and highlighting factors that influence leakage. Overall, the paper provides a rigorous theoretical foundation for understanding and mitigating privacy risks in FL during local training, and suggests directions for tighter bounds and extensions to other FL variants.
Abstract
Federated learning has emerged as a promising paradigm for collaborative model training while preserving data privacy. However, recent studies have shown that it is vulnerable to various privacy attacks, such as data reconstruction attacks. In this paper, we provide a theoretical analysis of privacy leakage in federated learning from two perspectives: linear algebra and optimization theory. From the linear algebra perspective, we prove that when the Jacobian matrix of the batch data is not full rank, there exist different batches of data that produce the same model update, thereby ensuring a level of privacy. We derive a sufficient condition on the batch size to prevent data reconstruction attacks. From the optimization theory perspective, we establish an upper bound on the privacy leakage in terms of the batch size, the distortion extent, and several other factors. Our analysis provides insights into the relationship between privacy leakage and various aspects of federated learning, offering a theoretical foundation for designing privacy-preserving federated learning algorithms.
