Table of Contents
Fetching ...

Differentially Private Data Release on Graphs: Inefficiencies and Unfairness

Ferdinando Fioretto, Diptangshu Sen, Juba Ziani

TL;DR

The paper analyzes how differential privacy applied to graph edge weights affects downstream routing tasks, focusing on bias and fairness in shortest-path decisions. It introduces a DP graph-release model using Gaussian noise with non-negativity clipping and develops a theoretical framework identifying two main fairness mechanisms: effective relative noise and path cardinality. Key analytical results provide bounds on the probability of bias and its high-probability interpretations, complemented by extensive experiments on grid, wheel, and scale-free graphs that reveal topology-dependent robustness and disparities. The work highlights the practical impact of topology on privacy-induced errors and offers guidance for designing privacy-aware networks and routing systems in transportation and related domains.

Abstract

Networks are crucial components of many sectors, including telecommunications, healthcare, finance, energy, and transportation.The information carried in such networks often contains sensitive user data, like location data for commuters and packet data for online users. Therefore, when considering data release for networks, one must ensure that data release mechanisms do not leak information about individuals, quantified in a precise mathematical sense. Differential Privacy (DP) is the widely accepted, formal, state-of-the-art technique, which has found use in a variety of real-life settings including the 2020 U.S. Census, Apple users' device data, or Google's location data. Yet, the use of DP comes with new challenges, as the noise added for privacy introduces inaccuracies or biases and further, DP techniques can also distribute these biases disproportionately across different populations, inducing fairness issues. The goal of this paper is to characterize the impact of DP on bias and unfairness in the context of releasing information about networks, taking a departure from previous work which has studied these effects in the context of private population counts release (such as in the U.S. Census). To this end, we consider a network release problem where the network structure is known to all, but the weights on edges must be released privately. We consider the impact of this private release on a simple downstream decision-making task run by a third-party, which is to find the shortest path between any two pairs of nodes and recommend the best route to users. This setting is of highly practical relevance, mirroring scenarios in transportation networks, where preserving privacy while providing accurate routing information is crucial. Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.

Differentially Private Data Release on Graphs: Inefficiencies and Unfairness

TL;DR

The paper analyzes how differential privacy applied to graph edge weights affects downstream routing tasks, focusing on bias and fairness in shortest-path decisions. It introduces a DP graph-release model using Gaussian noise with non-negativity clipping and develops a theoretical framework identifying two main fairness mechanisms: effective relative noise and path cardinality. Key analytical results provide bounds on the probability of bias and its high-probability interpretations, complemented by extensive experiments on grid, wheel, and scale-free graphs that reveal topology-dependent robustness and disparities. The work highlights the practical impact of topology on privacy-induced errors and offers guidance for designing privacy-aware networks and routing systems in transportation and related domains.

Abstract

Networks are crucial components of many sectors, including telecommunications, healthcare, finance, energy, and transportation.The information carried in such networks often contains sensitive user data, like location data for commuters and packet data for online users. Therefore, when considering data release for networks, one must ensure that data release mechanisms do not leak information about individuals, quantified in a precise mathematical sense. Differential Privacy (DP) is the widely accepted, formal, state-of-the-art technique, which has found use in a variety of real-life settings including the 2020 U.S. Census, Apple users' device data, or Google's location data. Yet, the use of DP comes with new challenges, as the noise added for privacy introduces inaccuracies or biases and further, DP techniques can also distribute these biases disproportionately across different populations, inducing fairness issues. The goal of this paper is to characterize the impact of DP on bias and unfairness in the context of releasing information about networks, taking a departure from previous work which has studied these effects in the context of private population counts release (such as in the U.S. Census). To this end, we consider a network release problem where the network structure is known to all, but the weights on edges must be released privately. We consider the impact of this private release on a simple downstream decision-making task run by a third-party, which is to find the shortest path between any two pairs of nodes and recommend the best route to users. This setting is of highly practical relevance, mirroring scenarios in transportation networks, where preserving privacy while providing accurate routing information is crucial. Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
Paper Structure (32 sections, 6 theorems, 33 equations, 11 figures)

This paper contains 32 sections, 6 theorems, 33 equations, 11 figures.

Key Result

Lemma 1

The Gaussian mechanism, defined as $\mathcal{M}(f,x,\varepsilon) = f(x) + Z$ where $Z \sim \mathcal{N}\left(0,\sqrt{2 \ln(1.25/\delta)} \cdot \Delta f / \varepsilon \right)$ is $(\varepsilon,\delta)$-differentially private.

Figures (11)

  • Figure 1: Schematic of privacy model: The network administrator privatizes graph $G$ by adding calibrated noise to each edge weight $w_e$ and publishes the privatized graph $\widetilde{G}$ with perturbed edge weights $\tilde{w}_e$. Users then use $\widetilde{G}$ to run downstream optimization tasks, such as shortest path computations.
  • Figure 2: Evaluation Framework: Given any node pair ($i,j$) and privatized graph $\widetilde{G}$, a user computes the shortest path between ($i,j$) on the set $\mathcal{P}_{ij}$. The computation returns path $\tilde{P}_{ij}$ as the perceived shortest path on $\widetilde{G}$ which the user commits to. Her decision is then evaluated on the original graph $G$ incurring a cost of $w_G(\tilde{P}_{ij})$ and realizing bias $B_{ij} = w_G(\tilde{P}_{ij})-w_G(P_{ij}^*)$.
  • Figure 3: Variation of probability $q$ as a function of gap $\alpha_{P',P^*}$ in (a) and sensitivity $\Delta f$ in (b) for different values of $|S_{P',P^*}|$. We set $(\varepsilon, \delta) = (1, 0.01)$. Additionally, for (a), we fix $\Delta f = 1$ and for (b), we fix $\alpha_{P',P^*} = 15$.
  • Figure 4: Evolution of the upper bound on $q_{\beta}$ as a function of $\beta$ for a wheel graph with $N = 21$. All ground truth edge weights drawn independently from $U[0, 1]$. We plot results for two types of source-destination pairs: the blue legend is for a pair of nodes which lie on diametrically opposite sides of the wheel graph, the red legend is for a pair of nodes consisting of the central node and a circumference node. The noise is sampled from a mean zero Gaussian distribution with standard deviation $\sigma = 0.3$. For very small values of $\beta$, the bound is vacuous. However, once the bound becomes non-trivial, it decreases rapidly and can be expected to approximate $q_{\beta}$ very accurately.
  • Figure 5: In (a), we show how the z-scores change with the cardinality of $\mathcal{P}_{ij}$. Higher values of $|\mathcal{P}_{ij}|$ leads to higher z-scores. For all cases, we use $\gamma = 0.05$, i.e., we desire $95\%$ coverage. In (b), we illustrate how the bounds on bias $B_{ij}$ calculated in Corollary \ref{['corr:bound_beta']} vary with $S$ and $|\mathcal{P}_{ij}|$.
  • ...and 6 more figures

Theorems & Definitions (21)

  • Definition 3.1: Neighboring datasets
  • Definition 3.2: $(\varepsilon,\delta)$-differential privacy
  • Lemma : The Gaussian mechanism
  • Theorem : dwork2014algorithmic
  • Remark 4.1: Motivating Example
  • Claim 5.1
  • proof
  • Definition 5.1
  • Conjecture 5.2
  • Lemma 5.3
  • ...and 11 more