Data Sharing with Endogenous Choices over Differential Privacy Levels
Raef Bassily, Kate Donahue, Diptangshu Sen, Annuo Zhao, Juba Ziani
TL;DR
This paper models decentralized data sharing under differential privacy where agents incur heterogeneous privacy costs and choose both participation and privacy levels endogenously. It introduces two stability notions (Nash and robust) and compares fully decentralized outcomes to a centralized social optimum benchmark, deriving closed-form solutions for centralization and analyzing equilibrium existence and structure under various privacy-cost regimes parameterized by $\alpha$. The results show that decentralization can yield nontrivial accuracy improvements only when privacy costs decrease sufficiently with coalition size ( $\alpha< -\tfrac{1}{2}$ ), but even then efficiency losses persist due to overly conservative privacy choices and stability constraints. The work provides explicit bounds on the Price of Stability for social cost and estimator variance and highlights fundamental trade-offs between decentralization, privacy, coalition size, and estimator accuracy with implications for data cooperatives and privacy-aware data sharing platforms.
Abstract
We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy costs. Each agent holds a sensitive data point and decides whether to participate in a data-sharing coalition and how much noise to add to their data. Privacy choices induce a fundamental trade-off: higher privacy reduces individual data-sharing costs but degrades data utility and statistical accuracy for the coalition. These choices generate externalities across agents, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how decentralized data sharing compares to a centralized, socially optimal benchmark. We provide a comprehensive equilibrium analysis across a broad range of privacy-cost regimes, from decreasing costs (e.g., privacy amplification from pooling data) to increasing costs (e.g., greater exposure to privacy attacks in larger coalitions). We first characterize Nash equilibrium coalitions with endogenous privacy levels and show that equilibria may fail to exist and can be non-monotonic in problem parameters. We also introduce a weaker equilibrium notion called robust equilibrium (that allows more widespread equilibrium existence by equipping existing players in the coalition with the power to prevent or veto external players from joining) and fully characterize such equilibria. Finally, we analyze, for both Nash and robust equilibria, the efficiency relative to the social optimum in terms of social welfare and estimator accuracy. We derive bounds that depend sharply on the number of players, properties of the cost profile and how privacy costs scale with coalition size.
