Data Sharing with Endogenous Choices over Differential Privacy Levels

Raef Bassily; Kate Donahue; Diptangshu Sen; Annuo Zhao; Juba Ziani

Data Sharing with Endogenous Choices over Differential Privacy Levels

Raef Bassily, Kate Donahue, Diptangshu Sen, Annuo Zhao, Juba Ziani

TL;DR

This paper models decentralized data sharing under differential privacy where agents incur heterogeneous privacy costs and choose both participation and privacy levels endogenously. It introduces two stability notions (Nash and robust) and compares fully decentralized outcomes to a centralized social optimum benchmark, deriving closed-form solutions for centralization and analyzing equilibrium existence and structure under various privacy-cost regimes parameterized by $\alpha$. The results show that decentralization can yield nontrivial accuracy improvements only when privacy costs decrease sufficiently with coalition size ( $\alpha< -\tfrac{1}{2}$ ), but even then efficiency losses persist due to overly conservative privacy choices and stability constraints. The work provides explicit bounds on the Price of Stability for social cost and estimator variance and highlights fundamental trade-offs between decentralization, privacy, coalition size, and estimator accuracy with implications for data cooperatives and privacy-aware data sharing platforms.

Abstract

We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy costs. Each agent holds a sensitive data point and decides whether to participate in a data-sharing coalition and how much noise to add to their data. Privacy choices induce a fundamental trade-off: higher privacy reduces individual data-sharing costs but degrades data utility and statistical accuracy for the coalition. These choices generate externalities across agents, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how decentralized data sharing compares to a centralized, socially optimal benchmark. We provide a comprehensive equilibrium analysis across a broad range of privacy-cost regimes, from decreasing costs (e.g., privacy amplification from pooling data) to increasing costs (e.g., greater exposure to privacy attacks in larger coalitions). We first characterize Nash equilibrium coalitions with endogenous privacy levels and show that equilibria may fail to exist and can be non-monotonic in problem parameters. We also introduce a weaker equilibrium notion called robust equilibrium (that allows more widespread equilibrium existence by equipping existing players in the coalition with the power to prevent or veto external players from joining) and fully characterize such equilibria. Finally, we analyze, for both Nash and robust equilibria, the efficiency relative to the social optimum in terms of social welfare and estimator accuracy. We derive bounds that depend sharply on the number of players, properties of the cost profile and how privacy costs scale with coalition size.

Data Sharing with Endogenous Choices over Differential Privacy Levels

TL;DR

. The results show that decentralization can yield nontrivial accuracy improvements only when privacy costs decrease sufficiently with coalition size (

), but even then efficiency losses persist due to overly conservative privacy choices and stability constraints. The work provides explicit bounds on the Price of Stability for social cost and estimator variance and highlights fundamental trade-offs between decentralization, privacy, coalition size, and estimator accuracy with implications for data cooperatives and privacy-aware data sharing platforms.

Abstract

Paper Structure (63 sections, 11 theorems, 69 equations, 1 figure)

This paper contains 63 sections, 11 theorems, 69 equations, 1 figure.

Introduction
Summary of contributions
Related work
Markets for Data
Decentralized coalition formation and data cooperatives
Data transactions under privacy constraints
Novelty of our work
Differential Privacy Preliminaries
Neighboring datasets
Differential privacy
Sensitivity and DP primitives
Privacy and accuracy.
Post-processing immunity
Model
Players:
...and 48 more sections

Key Result

Theorem 1

If a mechanism $\mathcal{M}$ is $\epsilon$-differentially private with respect to agent $i$, then for any (possibly randomized) function $f$ independent of the data, the composed mechanism $f(\mathcal{M}(\cdot))$ is also $\epsilon$-differentially private with respect to agent $i$.

Figures (1)

Figure 1: We plot the maximum size of equilibrium coalition that exists under the Nash definition (left) and the robust definition (right) as a function of $\sigma$. Parameters of the problem instance: cost profile $\vec{c} = [2.2\times 10^{-4},\ 5.4\times 10^{-4},\ 7.0\times 10^{-4},\ 11\times 10^{-4},\ 30\times 10^{-4},\ 33\times 10^{-4},\ 34\times 10^{-4},\ 36\times 10^{-4},\ 38\times 10^{-4}]$ and $\alpha = 1$. The shaded regions indicate $\sigma$ values for which no equilibrium exists under that stability definition. While the Nash definition exhibits non-monotonicity even in equilibrium existence, the robust definition exhibits monotonicity, not only in existence, but also in maximum equilibrium size.

Theorems & Definitions (37)

Definition 1: Neighboring Datasets
Definition 2: Differential Privacy
Definition 3: Sensitivity
Definition 4: Laplace Mechanism
Theorem : Post-processing dwork2014algorithmic
Definition 5: Player's burden
Definition 6
Definition 7: Nash-stable coalition
Definition 8: Equilibrium coalition robust to valid entry
Claim 1
...and 27 more

Data Sharing with Endogenous Choices over Differential Privacy Levels

TL;DR

Abstract

Data Sharing with Endogenous Choices over Differential Privacy Levels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (37)