Table of Contents
Fetching ...

The anonymization problem in social networks

Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

TL;DR

This work defines a general anonymization problem for social networks with three variants—full, partial, and budgeted—aiming to maximize $k$-anonymous nodes under a chosen anonymity measure. It introduces ANO-NET, a measure-agnostic framework, and four new heuristic algorithms (including two structure-based and two uniqueness-based approaches) with edge deletion identified as the most effective data-altering operation. Empirical results across graph models and real networks show that the uniqueness-based UA algorithm consistently improves anonymity while preserving data utility, achieving up to $4.8\times$ more anonymity in budgeted settings and up to $13.9\times$ more edge preservation in full anonymization compared to baseline. The study also highlights how the choice of anonymity measure profoundly affects initial anonymity and the difficulty of anonymization, establishing UA as a robust method for balancing privacy and utility and laying groundwork for future algorithmic enhancements.

Abstract

In this paper we introduce a general version of the anonymization problem in social networks, in which the goal is to maximize the number of anonymous nodes by altering a given graph. We define three variants of this optimization problem being full, partial and budgeted anonymization. In each, the objective is to maximize the number of k-anonymous nodes, i.e., nodes for which there are at least k-1 equivalent nodes, according to a particular anonymity measure of structural node equivalence. We propose four new heuristic algorithms for solving the anonymization problem which we implement into a reusable computational framework. As a baseline, we use an edge sampling method introduced in previous work. Experiments on both graph models and 23 real-world network datasets result in three empirical findings. First, we demonstrate that edge deletion is the most effective graph alteration operation. Second, we compare four commonly used anonymity measures from the literature and highlight how the choice of anonymity measure has a tremendous effect on both the initial anonymity as well as the difficulty of solving the anonymization problem. Third, we find that the proposed algorithm that preferentially deletes edges with a larger effect on nodes at a structurally unique position consistently outperforms heuristics solely based on network structure. Our best performing algorithm retains on average 14 times more edges in full anonymization, and overall ensures a better trade-off between anonymity and data utility. In the budgeted variant, it achieves 4.8 times more anonymous nodes than the baseline. This work lays foundations for future development of algorithms for anonymizing social networks.

The anonymization problem in social networks

TL;DR

This work defines a general anonymization problem for social networks with three variants—full, partial, and budgeted—aiming to maximize -anonymous nodes under a chosen anonymity measure. It introduces ANO-NET, a measure-agnostic framework, and four new heuristic algorithms (including two structure-based and two uniqueness-based approaches) with edge deletion identified as the most effective data-altering operation. Empirical results across graph models and real networks show that the uniqueness-based UA algorithm consistently improves anonymity while preserving data utility, achieving up to more anonymity in budgeted settings and up to more edge preservation in full anonymization compared to baseline. The study also highlights how the choice of anonymity measure profoundly affects initial anonymity and the difficulty of anonymization, establishing UA as a robust method for balancing privacy and utility and laying groundwork for future algorithmic enhancements.

Abstract

In this paper we introduce a general version of the anonymization problem in social networks, in which the goal is to maximize the number of anonymous nodes by altering a given graph. We define three variants of this optimization problem being full, partial and budgeted anonymization. In each, the objective is to maximize the number of k-anonymous nodes, i.e., nodes for which there are at least k-1 equivalent nodes, according to a particular anonymity measure of structural node equivalence. We propose four new heuristic algorithms for solving the anonymization problem which we implement into a reusable computational framework. As a baseline, we use an edge sampling method introduced in previous work. Experiments on both graph models and 23 real-world network datasets result in three empirical findings. First, we demonstrate that edge deletion is the most effective graph alteration operation. Second, we compare four commonly used anonymity measures from the literature and highlight how the choice of anonymity measure has a tremendous effect on both the initial anonymity as well as the difficulty of solving the anonymization problem. Third, we find that the proposed algorithm that preferentially deletes edges with a larger effect on nodes at a structurally unique position consistently outperforms heuristics solely based on network structure. Our best performing algorithm retains on average 14 times more edges in full anonymization, and overall ensures a better trade-off between anonymity and data utility. In the budgeted variant, it achieves 4.8 times more anonymous nodes than the baseline. This work lays foundations for future development of algorithms for anonymizing social networks.
Paper Structure (32 sections, 8 equations, 19 figures, 3 tables, 2 algorithms)

This paper contains 32 sections, 8 equations, 19 figures, 3 tables, 2 algorithms.

Figures (19)

  • Figure 1: Random edge addition (green), deletion (red), and rewiring (grey) applied to, using the count measure, anonymize ER (top left), BA (top right) and WS (bottom left) graphs with $|V| = 500$ and average degree $\in \{2, 4, 16, 64\}$.
  • Figure 2: Uniqueness when using four different anonymity measures (color) and three anonymization algorithms (linestyle) on real-world networks. For each network we show the fraction of deleted edges (horizontal axis) and the attained uniqueness (vertical axis).
  • Figure 3: Results for full and partial anonymization (top), where for a network the bar height indicates the fraction of edges that can be preserved when making all nodes anonymous, and the horizontal line indicates the best result obtained for partial anonymization. For Budgeted anonymization (bottom) each bar indicates the fraction of unique nodes anonymized with a budget of 5% of the edges. Color indicates the used anonymization algorithm.
  • Figure 4: For each of the three variants of the anonymization problem (columns) the color and number indicate the fraction of networks for which the network property or performance on a downstream task (vertical axis) is preserved, i.e., the change $\pm$ one standard deviation is less than 5% of the original value, using the five anonymization algorithms (horizontal axis).
  • Figure 5: Pareto optimal solutions in terms of uniqueness (horizontal axis) and performance on downstream tasks (vertical axis). Each dot represents a solution on the Pareto front found by an anonymization algorithm (color). Filled dots represent solutions for which less than 5% is deleted.
  • ...and 14 more figures

Theorems & Definitions (1)

  • Definition 1: Anonymization problem.