Table of Contents
Fetching ...

The effect of distant connections on node anonymity in complex networks

Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

TL;DR

This work proposes the use of d-k-anonymity, a novel measure that takes knowledge up to distance d of a considered node into account, and introduces anonymity-cascade, which exploits the so-called infectiousness of uniqueness: mere information about being connected to another unique node can make a given node uniquely identifiable.

Abstract

Ensuring privacy of individuals is of paramount importance to social network analysis research. Previous work assessed anonymity in a network based on the non-uniqueness of a node's ego network. In this work, we show that this approach does not adequately account for the strong de-anonymizing effect of distant connections. We first propose the use of d-k-anonymity, a novel measure that takes knowledge up to distance d of a considered node into account. Second, we introduce anonymity-cascade, which exploits the so-called infectiousness of uniqueness: mere information about being connected to another unique node can make a given node uniquely identifiable. These two approaches, together with relevant "twin node" processing steps in the underlying graph structure, offer practitioners flexible solutions, tunable in precision and computation time. This enables the assessment of anonymity in large-scale networks with up to millions of nodes and edges. Experiments on graph models and a wide range of real-world networks show drastic decreases in anonymity when connections at distance 2 are considered. Moreover, extending the knowledge beyond the ego network with just one extra link often already decreases overall anonymity by over 50%. These findings have important implications for privacy-aware sharing of sensitive network data.

The effect of distant connections on node anonymity in complex networks

TL;DR

This work proposes the use of d-k-anonymity, a novel measure that takes knowledge up to distance d of a considered node into account, and introduces anonymity-cascade, which exploits the so-called infectiousness of uniqueness: mere information about being connected to another unique node can make a given node uniquely identifiable.

Abstract

Ensuring privacy of individuals is of paramount importance to social network analysis research. Previous work assessed anonymity in a network based on the non-uniqueness of a node's ego network. In this work, we show that this approach does not adequately account for the strong de-anonymizing effect of distant connections. We first propose the use of d-k-anonymity, a novel measure that takes knowledge up to distance d of a considered node into account. Second, we introduce anonymity-cascade, which exploits the so-called infectiousness of uniqueness: mere information about being connected to another unique node can make a given node uniquely identifiable. These two approaches, together with relevant "twin node" processing steps in the underlying graph structure, offer practitioners flexible solutions, tunable in precision and computation time. This enables the assessment of anonymity in large-scale networks with up to millions of nodes and edges. Experiments on graph models and a wide range of real-world networks show drastic decreases in anonymity when connections at distance 2 are considered. Moreover, extending the knowledge beyond the ego network with just one extra link often already decreases overall anonymity by over 50%. These findings have important implications for privacy-aware sharing of sensitive network data.
Paper Structure (28 sections, 2 theorems, 15 figures, 2 tables)

This paper contains 28 sections, 2 theorems, 15 figures, 2 tables.

Key Result

Theorem 1

In a graph $G = (V, E)$, a node $v$ that is unique using anonymity-cascade with $\ell = 1$ ($C_1$) is also unique using $2$-$k$-anonymity.

Figures (15)

  • Figure 1: Four approaches for assessing node anonymity: Ego network uniqueness romanini2020privacy (A), followed by the three techniques discussed in this paper: $d$-$k$-anonymitydejong2023algorithms (B), anonymity-cascade (C) and anonymity-cascade with twin nodes (D). For each approach, the top row shows the uniquely identified nodes in the giant component of the Copnet calls network. sapiezynski2019copenhagen Red nodes are unique using $1$-$k$-anonymity (i.e., ego network uniqueness), black nodes with $2$-$k$-anonymity (subfigure B only). Pink nodes can be identified using one cascading step ($C_1$), orange nodes with multiple steps ($C_\mathit{max-\ell}$). Grey nodes are not uniquely identified using the considered approach. The bottom row illustrates an example of a $d$-neighborhood, detailing which knowledge is taken into account by each approach (edge and node outline color). Subfigure C and D show the paths traversed by anonymity-cascade to identify the pink and orange nodes, given that the red center node is unique for $1$-$k$-anonymity.
  • Figure 2: Uniqueness maps using $d$-$k$-anonymity. Maps show network uniqueness, indicated by color, when using information of the 1-neighborhood (top row) and 2-neighborhood (bottom row). Results are shown for the Erdős–Rényi (left), Barabási–Albert (middle) and Watts–Strogatz (right) model with different sizes (horizontal axis) and average degree or$m$, an equivalent thereof (vertical axis).
  • Figure 3: $d$-$k$-Anonymity in real-world networks. Results are shown for for the 29 real-world networks in Table \ref{['tab:data']} for which $d$-$k$-anonymity with $d=5$ could be computed within three hours. Each cell denotes the fraction (ranging from 0.0 (white) to 1.0 (dark blue)) of nodes that are $\leq k$-anonymous when accounting for knowledge of the $d$-neighborhood.
  • Figure 4: Uniqueness maps using anonymity-cascade. Maps show network uniqueness, indicated by color, when using one level of cascading (top row) and up to the final level of cascading (bottom row). Results are shown for the Erdős–Rényi (left), Barabási–Albert (middle) and Watts–Strogatz (right) model with different sizes (horizontal axis) and average degree or$m$, an equivalent thereof (vertical axis).
  • Figure 5: Uniqueness in real-world networks. Fraction of unique nodes (vertical axis) on different datasets (horizontal axis) when accounting for different levels of information: 1-neighborhood (red), 2-neighborhood (black line with triangle), cascading one level (pink) and the cascading final (yellow).
  • ...and 10 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Proof 1
  • Definition 3
  • Theorem 2
  • Proof 2