A Refreshment Stirred, Not Shaken (III): Can Swapping Be Differentially Private?
James Bailie, Ruobin Gong, Xiao-Li Meng
TL;DR
The paper develops a unified five-building-block framework for differential privacy (DP) specifications, consisting of the domain $\mathcal{X}$, multiverse $\mathscr{D}$, input premetric $d_{\mathcal{X}}$, output premetric $D_{\Pr}$, and protection loss budget $\varepsilon_{\mathcal{D}}$, and presents a Lipschitz-type condition $D_{\Pr}(\mathsf P_{\bm x},\mathsf P_{\bm x'}) \le \varepsilon_{\mathcal{D}}\, d_{\mathcal{X}}(\bm x, \bm x')$ to unify DP flavors. It applies this framework to the US Census, contrasting 2010 swapping (data swapping) with 2020’s TopDown Algorithm (TDA), showing that swapping can be DP when invariants are accounted for, and that the 2010 and 2020 disclosures occupy different DP specifications with distinct protection units and invariants. The paper argues that DP and traditional statistical disclosure control (SDC) can be reconciled to reap the strengths of both, while highlighting the risks and tradeoffs introduced by invariants, transparency, and epistemic uncertainty. It also discusses practical strategies to mitigate invariant-induced risks, including probabilistic matching and pre/post-swap perturbations, and contemplates extending DP to embrace epistemic uncertainty via imprecise probabilities. Overall, the work provides a rigorous, compositional lens for understanding privacy-utility tradeoffs in large-scale releases like censuses, and it clarifies when swapping can be considered DP within an expanded specification framework.
Abstract
The quest for a precise and contextually grounded answer to the question in the present paper's title resulted in this stirred-not-shaken triptych, a phrase that reflects our desire to deepen the theoretical basis, broaden the practical applicability, and reduce the misperception of differential privacy (DP)$\unicode{x2014}$all without shaking its core foundations. Indeed, given the existence of more than 200 formulations of DP (and counting), before even attempting to answer the titular question one must first precisely specify what it actually means to be DP. Motivated by this observation, a theoretical investigation into DP's fundamental essence resulted in Part I of this trio, which introduces a five-building-block system explicating the who, where, what, how and how much aspects of DP. Instantiating this system in the context of the United States Decennial Census, Part II then demonstrates the broader applicability and relevance of DP by comparing a swapping strategy like that used in 2010 with the TopDown Algorithm$\unicode{x2014}$a DP method adopted in the 2020 Census. This paper provides nontechnical summaries of the preceding two parts as well as new discussion$\unicode{x2014}$for example, on how greater awareness of the five building blocks can thwart privacy theatrics; how our results bridging traditional SDC and DP allow a data custodian to reap the benefits of both these fields; how invariants impact disclosure risk; and how removing the implicit reliance on aleatoric uncertainty could lead to new generalizations of DP.
