Privacy Attacks in Decentralized Learning
Abdellah El Mrini, Edwige Cyffers, Aurélien Bellet
TL;DR
This paper shows that privacy is not guaranteed in decentralized learning via gossip protocols: honest-but-curious attackers can reconstruct private data of non-neighboring nodes by exploiting the linear relationships in exchanged messages. The authors develop a reconstruction framework that builds a knowledge matrix $K_T$ from observed communications and solves a linear system $K_T X = Y_T$ to recover private inputs, with an extension to Decentralized Gradient Descent (D-GD) that reconstructs gradients first and then data via gradient-inversion as a black box. They validate the attacks on synthetic and real graphs, demonstrating substantial leakage even from a single attacker and stronger leakage with multiple attackers; graph topology, attacker position, and learning rate strongly influence success. The work argues that decentralization alone is insufficient for privacy and emphasizes the need for defenses such as differential privacy, secure aggregation, or graph-design strategies to mitigate leakage. It also sets up a foundation for auditing the privacy risk of a given gossip matrix and network structure, guiding safer deployment of decentralized learning systems.
Abstract
Decentralized Gradient Descent (D-GD) allows a set of users to perform collaborative learning without sharing their data by iteratively averaging local model updates with their neighbors in a network graph. The absence of direct communication between non-neighbor nodes might lead to the belief that users cannot infer precise information about the data of others. In this work, we demonstrate the opposite, by proposing the first attack against D-GD that enables a user (or set of users) to reconstruct the private data of other users outside their immediate neighborhood. Our approach is based on a reconstruction attack against the gossip averaging protocol, which we then extend to handle the additional challenges raised by D-GD. We validate the effectiveness of our attack on real graphs and datasets, showing that the number of users compromised by a single or a handful of attackers is often surprisingly large. We empirically investigate some of the factors that affect the performance of the attack, namely the graph topology, the number of attackers, and their position in the graph.
