Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory

Xiaojin Zhang; Wei Chen

Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory

Xiaojin Zhang, Wei Chen

TL;DR

This work addresses privacy leakage in horizontal federated learning by developing two theoretical analyses. The linear-algebra view shows that a Jacobian of the batch data with rank smaller than the batch-data dimension ($\mathrm{rank}(\mathbf{J}) < Bp$) yields non-unique model updates for different batches, leading to a sufficient condition $d < Bp$ that prevents unique data reconstruction. The optimization-theory view defines privacy leakage as $\epsilon_p^k = 1 - \mathbb{E}[\frac{1}{|\mathcal{D}^k|} \sum_i \frac{1}{T} \sum_t \frac{\|\tilde{x}_{t,i}^k - x_i^k\|}{D}]$ with distortion $\Delta$, and derives an upper bound $\epsilon_p^k \le 1 + \sqrt{\frac{\ln 2 + \mathrm{poly}(B)}{2B}} - \frac{c_a}{2D} \Delta^k$ under bi-Lipschitz gradients and self-bounded regret. These results link privacy leakage to batch size and distortion, guiding privacy-preserving FL design and highlighting factors that influence leakage. Overall, the paper provides a rigorous theoretical foundation for understanding and mitigating privacy risks in FL during local training, and suggests directions for tighter bounds and extensions to other FL variants.

Abstract

Federated learning has emerged as a promising paradigm for collaborative model training while preserving data privacy. However, recent studies have shown that it is vulnerable to various privacy attacks, such as data reconstruction attacks. In this paper, we provide a theoretical analysis of privacy leakage in federated learning from two perspectives: linear algebra and optimization theory. From the linear algebra perspective, we prove that when the Jacobian matrix of the batch data is not full rank, there exist different batches of data that produce the same model update, thereby ensuring a level of privacy. We derive a sufficient condition on the batch size to prevent data reconstruction attacks. From the optimization theory perspective, we establish an upper bound on the privacy leakage in terms of the batch size, the distortion extent, and several other factors. Our analysis provides insights into the relationship between privacy leakage and various aspects of federated learning, offering a theoretical foundation for designing privacy-preserving federated learning algorithms.

Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory

TL;DR

) yields non-unique model updates for different batches, leading to a sufficient condition

that prevents unique data reconstruction. The optimization-theory view defines privacy leakage as

with distortion

, and derives an upper bound

under bi-Lipschitz gradients and self-bounded regret. These results link privacy leakage to batch size and distortion, guiding privacy-preserving FL design and highlighting factors that influence leakage. Overall, the paper provides a rigorous theoretical foundation for understanding and mitigating privacy risks in FL during local training, and suggests directions for tighter bounds and extensions to other FL variants.

Abstract

Paper Structure (11 sections, 4 theorems, 28 equations, 1 table, 2 algorithms)

This paper contains 11 sections, 4 theorems, 28 equations, 1 table, 2 algorithms.

Introduction
Related Work
Federated Learning
Privacy Attacks in Federated Learning
Defense Mechanisms
Preliminaries
Federated Learning
Privacy Attacks in Federated Learning
Theoretical Analysis from the Perspective of Linear Algebra
Theoretical Analysis from the Perspective of Optimization Theory
Conclusion

Key Result

Theorem 4.1

Let $d$ be the model parameter dimension and $p$ be the dimension of a single data point. Consider a batch of data $\{ \mathbf{x}_b^k, y_b^k \}_{b=1}^B$ with $\mathbf{x}_b^k \in \mathbb{R}^p$ and $y_b^k \in \mathbb{R}$. Let $\Delta \theta_{t}^k(\{ \mathbf{x}_b^k, y_b^k \}_{b=1}^B)$ represent the mod This means that the private batch data $\{ \mathbf{x}_b^k, y_b^k \}_{b=1}^B$ cannot be uniquely det

Theorems & Definitions (9)

Theorem 4.1
proof
Theorem 4.2
proof
Definition 5.1: Privacy Leakage
Definition 5.2: Distortion Extent
Lemma 5.1: Chernoff-Hoeffding Inequality
Theorem 5.2: Upper Bound for Privacy Leakage
proof

Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory

TL;DR

Abstract

Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (9)