Table of Contents
Fetching ...

Exposing the Vulnerability of Decentralized Learning to Membership Inference Attacks Through the Lens of Graph Mixing

Ousmane Touat, Jezekael Brunon, Yacine Belal, Julien Nicolas, César Sabater, Mohamed Maouche, Sonia Ben Mokhtar

TL;DR

This work analyzes how Membership Inference Attacks threaten decentralized, gossip-based learning and identifies two mixing-related factors—local model mixing strategy and global graph mixing properties—as core determinants of MIA vulnerability. It introduces SAMO, a Send-All-Merge-Once protocol, and demonstrates that dynamic, random peer sampling combined with SAMO markedly improves the privacy-utility tradeoff across multiple datasets, particularly when paired with differential privacy techniques. The study provides both empirical and theoretical insights, showing that dynamic topologies accelerate graph mixing (lowering the second eigenvalue of the mixing matrix) and reduce leakage, while non-i.i.d. data and early overfitting still pose significant privacy risks. The findings offer practical design guidance for privacy-aware decentralized systems, advocating dynamic topologies, stronger mixing, and careful data-heterogeneity handling to achieve safer collaborative learning in distributed environments.

Abstract

The primary promise of decentralized learning is to allow users to engage in the training of machine learning models in a collaborative manner while keeping their data on their premises and without relying on any central entity. However, this paradigm necessitates the exchange of model parameters or gradients between peers. Such exchanges can be exploited to infer sensitive information about training data, which is achieved through privacy attacks (e.g., Membership Inference Attacks -- MIA). In order to devise effective defense mechanisms, it is important to understand the factors that increase/reduce the vulnerability of a given decentralized learning architecture to MIA. In this study, we extensively explore the vulnerability to MIA of various decentralized learning architectures by varying the graph structure (e.g., number of neighbors), the graph dynamics, and the aggregation strategy, across diverse datasets and data distributions. Our key finding, which to the best of our knowledge we are the first to report, is that the vulnerability to MIA is heavily correlated to (i) the local model mixing strategy performed by each node upon reception of models from neighboring nodes and (ii) the global mixing properties of the communication graph. We illustrate these results experimentally using four datasets and by theoretically analyzing the mixing properties of various decentralized architectures. We also empirically show that enhancing mixing properties is highly beneficial when combined with other privacy-preserving techniques such as Differential Privacy. Our paper draws a set of lessons learned for devising decentralized learning systems that reduce by design the vulnerability to MIA.

Exposing the Vulnerability of Decentralized Learning to Membership Inference Attacks Through the Lens of Graph Mixing

TL;DR

This work analyzes how Membership Inference Attacks threaten decentralized, gossip-based learning and identifies two mixing-related factors—local model mixing strategy and global graph mixing properties—as core determinants of MIA vulnerability. It introduces SAMO, a Send-All-Merge-Once protocol, and demonstrates that dynamic, random peer sampling combined with SAMO markedly improves the privacy-utility tradeoff across multiple datasets, particularly when paired with differential privacy techniques. The study provides both empirical and theoretical insights, showing that dynamic topologies accelerate graph mixing (lowering the second eigenvalue of the mixing matrix) and reduce leakage, while non-i.i.d. data and early overfitting still pose significant privacy risks. The findings offer practical design guidance for privacy-aware decentralized systems, advocating dynamic topologies, stronger mixing, and careful data-heterogeneity handling to achieve safer collaborative learning in distributed environments.

Abstract

The primary promise of decentralized learning is to allow users to engage in the training of machine learning models in a collaborative manner while keeping their data on their premises and without relying on any central entity. However, this paradigm necessitates the exchange of model parameters or gradients between peers. Such exchanges can be exploited to infer sensitive information about training data, which is achieved through privacy attacks (e.g., Membership Inference Attacks -- MIA). In order to devise effective defense mechanisms, it is important to understand the factors that increase/reduce the vulnerability of a given decentralized learning architecture to MIA. In this study, we extensively explore the vulnerability to MIA of various decentralized learning architectures by varying the graph structure (e.g., number of neighbors), the graph dynamics, and the aggregation strategy, across diverse datasets and data distributions. Our key finding, which to the best of our knowledge we are the first to report, is that the vulnerability to MIA is heavily correlated to (i) the local model mixing strategy performed by each node upon reception of models from neighboring nodes and (ii) the global mixing properties of the communication graph. We illustrate these results experimentally using four datasets and by theoretically analyzing the mixing properties of various decentralized architectures. We also empirically show that enhancing mixing properties is highly beneficial when combined with other privacy-preserving techniques such as Differential Privacy. Our paper draws a set of lessons learned for devising decentralized learning systems that reduce by design the vulnerability to MIA.

Paper Structure

This paper contains 37 sections, 13 equations, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: Model Update and Aggregation in Gossip Learning (left) vs in SAMO (right)
  • Figure 2: Trade-off between MIA Accuracy and Global Test Accuracy and between MIA TPR@1%FPR and Global Test Accuracy across different datasets, comparing Base Gossip and SAMO, on a 5-regular graph with 150 nodes (60 nodes on CIFAR100).
  • Figure 3: Trade-off between MIA Accuracy and Global Test Accuracy and between MIA TPR@1%FPR and Global Test Accuracy across different datasets, comparing static and dynamic topology setups, on a 2-regular graph with 150 nodes (60 nodes on CIFAR100).
  • Figure 4: Maximum MIA TPR@1%FPR values on the canary set over communication rounds across different datasets, comparing static and dynamic topology setups, on a 2-regular graph with 150 nodes (60 nodes on CIFAR100).
  • Figure 5: Comparison of maximum average MIA accuracy and TPR@1%FPR and according global test accuracy across different network view sizes and topology setups on the CIFAR-10 dataset using SAMO, with regular graph with 150 nodes.
  • ...and 5 more figures