Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization
Wenrui Yu, Qiongxiu Li, Milan Lopuhaä-Zwakenberg, Mads Græsbøll Christensen, Richard Heusdens
TL;DR
This work analyzes privacy in centralized versus decentralized federated learning through both information-theoretic bounds and empirical privacy attacks. It introduces mutual-information-based bounds showing that privacy leakage in decentralized FL under distributed optimization is never larger than in centralized FL, and shows how noise in auxiliary variables can further reduce leakage. The authors validate their theory with logistic regression and deep neural networks, demonstrating that DFL often offers lower privacy risk, especially for complex models and larger honest components, though simpler models may yield comparable leakage. Across experiments with gradient inversion and membership inference attacks, CFL generally leaks more private information than DFL, and the privacy gap narrows as more nodes become corrupt. The results emphasize the practical privacy benefits of distributed optimization-based DFL, suggesting targeted deployments where decentralization improves privacy against iterative-attacks while maintaining convergence performance.
Abstract
Federated learning (FL) emerged as a paradigm designed to improve data privacy by enabling data to reside at its source, thus embedding privacy as a core consideration in FL architectures, whether centralized or decentralized. Contrasting with recent findings by Pasquini et al., which suggest that decentralized FL does not empirically offer any additional privacy or security benefits over centralized models, our study provides compelling evidence to the contrary. We demonstrate that decentralized FL, when deploying distributed optimization, provides enhanced privacy protection - both theoretically and empirically - compared to centralized approaches. The challenge of quantifying privacy loss through iterative processes has traditionally constrained the theoretical exploration of FL protocols. We overcome this by conducting a pioneering in-depth information-theoretical privacy analysis for both frameworks. Our analysis, considering both eavesdropping and passive adversary models, successfully establishes bounds on privacy leakage. We show information theoretically that the privacy loss in decentralized FL is upper bounded by the loss in centralized FL. Compared to the centralized case where local gradients of individual participants are directly revealed, a key distinction of optimization-based decentralized FL is that the relevant information includes differences of local gradients over successive iterations and the aggregated sum of different nodes' gradients over the network. This information complicates the adversary's attempt to infer private data. To bridge our theoretical insights with practical applications, we present detailed case studies involving logistic regression and deep neural networks. These examples demonstrate that while privacy leakage remains comparable in simpler models, complex models like deep neural networks exhibit lower privacy risks under decentralized FL.
