Table of Contents
Fetching ...

Community Concealment from Unsupervised Graph Learning-Based Clustering

Dalyapraz Manatova, Pablo Moriano, L. Jean Camp

TL;DR

This work studies group-level privacy risks arising from GNN-based unsupervised community detection and proposes a defense against targeted community inference. It identifies two key factors that govern hidability: the inter/intra-edge ratio of the target community and the feature-space proximity to neighboring communities, and uses these insights to motivate a feature-guided defense. The authors introduce FCom-DICE, which extends the structural perturbations of DICE by adding feature-aware edge attachments and node-feature adjustments to disrupt GNN message passing. Across synthetic featurized LFR graphs and real networks, FCom-DICE yields median improvements of roughly $20 ext{-}45\n%$ over DICE under identical budgets while preserving the overall community structure, underscoring the importance of jointly considering topology and attributes in privacy-aware graph learning.

Abstract

Graph neural networks (GNNs) are designed to use attributed graphs to learn representations. Such representations are beneficial in the unsupervised learning of clusters and community detection. Nonetheless, such inference may reveal sensitive groups, clustered systems, or collective behaviors, raising concerns regarding group-level privacy. Community attribution in social and critical infrastructure networks, for example, can expose coordinated asset groups, operational hierarchies, and system dependencies that could be used for profiling or intelligence gathering. We study a defensive setting in which a data publisher (defender) seeks to conceal a community of interest while making limited, utility-aware changes in the network. Our analysis indicates that community concealment is strongly influenced by two quantifiable factors: connectivity at the community boundary and feature similarity between the protected community and adjacent communities. Informed by these findings, we present a perturbation strategy that rewires a set of selected edges and modifies node features to reduce the distinctiveness leveraged by GNN message passing. The proposed method outperforms DICE in our experiments on synthetic benchmarks and real network graphs under identical perturbation budgets. Overall, it achieves median relative concealment improvements of approximately 20-45% across the evaluated settings. These findings demonstrate a mitigation strategy against GNN-based community learning and highlight group-level privacy risks intrinsic to graph learning.

Community Concealment from Unsupervised Graph Learning-Based Clustering

TL;DR

This work studies group-level privacy risks arising from GNN-based unsupervised community detection and proposes a defense against targeted community inference. It identifies two key factors that govern hidability: the inter/intra-edge ratio of the target community and the feature-space proximity to neighboring communities, and uses these insights to motivate a feature-guided defense. The authors introduce FCom-DICE, which extends the structural perturbations of DICE by adding feature-aware edge attachments and node-feature adjustments to disrupt GNN message passing. Across synthetic featurized LFR graphs and real networks, FCom-DICE yields median improvements of roughly over DICE under identical budgets while preserving the overall community structure, underscoring the importance of jointly considering topology and attributes in privacy-aware graph learning.

Abstract

Graph neural networks (GNNs) are designed to use attributed graphs to learn representations. Such representations are beneficial in the unsupervised learning of clusters and community detection. Nonetheless, such inference may reveal sensitive groups, clustered systems, or collective behaviors, raising concerns regarding group-level privacy. Community attribution in social and critical infrastructure networks, for example, can expose coordinated asset groups, operational hierarchies, and system dependencies that could be used for profiling or intelligence gathering. We study a defensive setting in which a data publisher (defender) seeks to conceal a community of interest while making limited, utility-aware changes in the network. Our analysis indicates that community concealment is strongly influenced by two quantifiable factors: connectivity at the community boundary and feature similarity between the protected community and adjacent communities. Informed by these findings, we present a perturbation strategy that rewires a set of selected edges and modifies node features to reduce the distinctiveness leveraged by GNN message passing. The proposed method outperforms DICE in our experiments on synthetic benchmarks and real network graphs under identical perturbation budgets. Overall, it achieves median relative concealment improvements of approximately 20-45% across the evaluated settings. These findings demonstrate a mitigation strategy against GNN-based community learning and highlight group-level privacy risks intrinsic to graph learning.
Paper Structure (35 sections, 16 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 35 sections, 16 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Adversarial community detection scenario. Left: the original network operated by a defender, containing several communities, including one target community that must remain concealed. Right: the output of a GNN used by an adversary to infer community structure. The defender’s goal is to modify the graph slightly so that the GNN run by the adversary fails to correctly recover the target community.
  • Figure 2: Results of DICE performance with different $\sigma_c$, $\mu$, and perturbation budget $\beta_b$ with $p=0.5$ (50% of the $b$ allocated to deletion vs adding edges) averaged over all realizations. Shaded bands around lines denote $\pm 1$ s.d. across all runs.
  • Figure 3: Average rate of change of $M_1$ and $M_2$ vs. $\mu$ for each $\sigma_c$.
  • Figure 4: Average rate of change of $M_1$ and $M_2$ vs. $\sigma_c$, averaged over $\mu$.
  • Figure 5: Heatmaps of $M_1$ and $M_2$ results of DICE performance with a combination of different $\sigma_c$, $\mu$, and $p$ averaged over all realizations, community labels and all $\beta_b$.
  • ...and 6 more figures