Evading Community Detection via Counterfactual Neighborhood Search
Andrea Bernini, Fabrizio Silvestri, Gabriele Tolomei
TL;DR
This work tackles privacy risks from community detection by introducing community membership hiding, where a target node strategically rewires its neighborhood to avoid detection by a given algorithm $f(\cdot)$ under a budget. It formulates the problem as a constrained counterfactual graph objective and solves it with a deep reinforcement learning approach (A2C) that operates on a graph neural encoder within an MDP, learning feasible add/del edge actions for the target node. The key contributions include formal problem definition, a DRL-based solution with a GNN encoder, and validation across five real-world datasets showing superior balance between hiding success (SR) and preservation of the original graph structure (NMI), with demonstrated transferability to unseen detection algorithms. The findings have practical implications for privacy-preserving options on social platforms and highlight considerations for ethical use and potential misuse, motivating further work on larger-scale graphs and multi-node hiding scenarios.
Abstract
Community detection techniques are useful for social media platforms to discover tightly connected groups of users who share common interests. However, this functionality often comes at the expense of potentially exposing individuals to privacy breaches by inadvertently revealing their tastes or preferences. Therefore, some users may wish to preserve their anonymity and opt out of community detection for various reasons, such as affiliation with political or religious organizations, without leaving the platform. In this study, we address the challenge of community membership hiding, which involves strategically altering the structural properties of a network graph to prevent one or more nodes from being identified by a given community detection algorithm. We tackle this problem by formulating it as a constrained counterfactual graph objective, and we solve it via deep reinforcement learning. Extensive experiments demonstrate that our method outperforms existing baselines, striking the best balance between accuracy and cost.
