The anonymization problem in social networks
Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes
TL;DR
This work defines a general anonymization problem for social networks with three variants—full, partial, and budgeted—aiming to maximize $k$-anonymous nodes under a chosen anonymity measure. It introduces ANO-NET, a measure-agnostic framework, and four new heuristic algorithms (including two structure-based and two uniqueness-based approaches) with edge deletion identified as the most effective data-altering operation. Empirical results across graph models and real networks show that the uniqueness-based UA algorithm consistently improves anonymity while preserving data utility, achieving up to $4.8\times$ more anonymity in budgeted settings and up to $13.9\times$ more edge preservation in full anonymization compared to baseline. The study also highlights how the choice of anonymity measure profoundly affects initial anonymity and the difficulty of anonymization, establishing UA as a robust method for balancing privacy and utility and laying groundwork for future algorithmic enhancements.
Abstract
In this paper we introduce a general version of the anonymization problem in social networks, in which the goal is to maximize the number of anonymous nodes by altering a given graph. We define three variants of this optimization problem being full, partial and budgeted anonymization. In each, the objective is to maximize the number of k-anonymous nodes, i.e., nodes for which there are at least k-1 equivalent nodes, according to a particular anonymity measure of structural node equivalence. We propose four new heuristic algorithms for solving the anonymization problem which we implement into a reusable computational framework. As a baseline, we use an edge sampling method introduced in previous work. Experiments on both graph models and 23 real-world network datasets result in three empirical findings. First, we demonstrate that edge deletion is the most effective graph alteration operation. Second, we compare four commonly used anonymity measures from the literature and highlight how the choice of anonymity measure has a tremendous effect on both the initial anonymity as well as the difficulty of solving the anonymization problem. Third, we find that the proposed algorithm that preferentially deletes edges with a larger effect on nodes at a structurally unique position consistently outperforms heuristics solely based on network structure. Our best performing algorithm retains on average 14 times more edges in full anonymization, and overall ensures a better trade-off between anonymity and data utility. In the budgeted variant, it achieves 4.8 times more anonymous nodes than the baseline. This work lays foundations for future development of algorithms for anonymizing social networks.
