A Refreshment Stirred, Not Shaken (II): Invariant-Preserving Deployments of Differential Privacy for the US Decennial Census

James Bailie; Ruobin Gong; Xiao-Li Meng

A Refreshment Stirred, Not Shaken (II): Invariant-Preserving Deployments of Differential Privacy for the US Decennial Census

James Bailie, Ruobin Gong, Xiao-Li Meng

TL;DR

The paper develops an invariant-aware framework for differential privacy deployments in census data, explicitly analyzing two classic SDC methods: the Permutation Swapping Algorithm (PSA) and the TopDown Algorithm (TDA). It formalizes a unified DP specification system with building blocks and multiverse invariants, enabling rigorous DP guarantees for invariant-preserving mechanisms. The PSA is shown to satisfy ε-DP subject to its invariants, with the privacy budget depending on swap rate and stratum size, while the TDA is shown to satisfy a zCDP-based DP specification under its invariants, illustrating that invariants critically shape actual privacy protection. Through numerical demonstrations (e.g., 1940 Census) and counterfactual comparisons to the 2020 DAS, the work clarifies how invariants influence both security guarantees and data utility, and highlights the importance of careful interpretation when translating theoretical DP guarantees into practical privacy protection. Overall, the paper provides a principled, multi-building-block lens to compare traditional SDC methods with modern DP deployments and emphasizes the nuanced role of invariants in determining true privacy protection.

Abstract

Through the lens of the system of differential privacy specifications developed in Part I of a trio of articles, this second paper examines two statistical disclosure control (SDC) methods for the United States Decennial Census: the Permutation Swapping Algorithm (PSA), which is similar to the 2010 Census's disclosure avoidance system (DAS), and the TopDown Algorithm (TDA), which was used in the 2020 DAS. To varying degrees, both methods leave unaltered some statistics of the confidential data $\unicode{x2013}$ which are called the method's invariants $\unicode{x2013}$ and hence neither can be readily reconciled with differential privacy (DP), at least as it was originally conceived. Nevertheless, we establish that the PSA satisfies $\varepsilon$-DP subject to the invariants it necessarily induces, thereby showing that this traditional SDC method can in fact still be understood within our more-general system of DP specifications. By a similar modification to $ρ$-zero concentrated DP, we also provide a DP specification for the TDA. Finally, as a point of comparison, we consider the counterfactual scenario in which the PSA was adopted for the 2020 Census, resulting in a reduction in the nominal privacy loss, but at the cost of releasing many more invariants. Therefore, while our results explicate the mathematical guarantees of SDC provided by the PSA, the TDA and the 2020 DAS in general, care must be taken in their translation to actual privacy protection $\unicode{x2013}$ just as is the case for any DP deployment.

A Refreshment Stirred, Not Shaken (II): Invariant-Preserving Deployments of Differential Privacy for the US Decennial Census

TL;DR

Abstract

which are called the method's invariants

and hence neither can be readily reconciled with differential privacy (DP), at least as it was originally conceived. Nevertheless, we establish that the PSA satisfies

-DP subject to the invariants it necessarily induces, thereby showing that this traditional SDC method can in fact still be understood within our more-general system of DP specifications. By a similar modification to

-zero concentrated DP, we also provide a DP specification for the TDA. Finally, as a point of comparison, we consider the counterfactual scenario in which the PSA was adopted for the 2020 Census, resulting in a reduction in the nominal privacy loss, but at the cost of releasing many more invariants. Therefore, while our results explicate the mathematical guarantees of SDC provided by the PSA, the TDA and the 2020 DAS in general, care must be taken in their translation to actual privacy protection

just as is the case for any DP deployment.

Paper Structure (30 sections, 17 theorems, 67 equations, 2 figures, 5 tables, 2 algorithms)

This paper contains 30 sections, 17 theorems, 67 equations, 2 figures, 5 tables, 2 algorithms.

Data Privacy with Invariant Constraints
A System of DP Specifications
Paper Contributions and Organization
A DP Analysis of Data Swapping
Data Swapping
What Invariants Does Swapping Preserve?
Permutation Swapping Satisfies ε-DP Subject to Its Invariants
A Numerical Demonstration: The 1940 Census Full Count Data
Estimating the DP Specification of the 2010 DAS
A DP Analysis of the TopDown Algorithm
Comparisons between the PSA and the 2020 DAS
Explanatory Notes to Table \ref{['tab:compare_2020']}
Overview of the 2020 DAS
What if the 2020 Census Used Swapping?
The Protection Units for the 2020 DAS and for the PSA
...and 15 more sections

Key Result

Proposition 1

Suppose that $\bm V_{\mathrm{Hold}} \setminus \bm V_{\mathrm{Match}}$ and $\bm V_{\mathrm{Swap}}$ are non-empty. Then, without loss of generality, we may assume that each of $\bm V_{\mathrm{Match}}, \bm V_{\mathrm{Swap}}$ and $\bm V_{\mathrm{Hold}} \setminus \bm V_{\mathrm{Match}}$ are univariate. D

Figures (2)

Figure 1: Mean absolute percentage error (MAPE) in the two-way tabulation of dwelling ownership by county induced by the PSA applied to the 1940 Census full count data of Massachusetts, at different swap rates from $1\%$ to $50\%$. Each boxplot reflects $20$ independent runs of the PSA at that swap rate.
Figure 2: Conversion between the nominal privacy loss budget ($\varepsilon$) and the swap rate ($p$) for the PSA. Color and line type encode different values of $b$, the size of the largest stratum delineated by $\bm V_{\mathrm{Match}}$ (from $2$ to $1$ million). Outlined diamonds indicate the smallest $\varepsilon$ attainable for each $b$. Grey dotted horizontal lines correspond to swap rates of 5% and 50% respectively. The $\varepsilon$ values are nominal in that the privacy guarantee they afford shall be understood in the context of $\bm c_{\mathrm{Swap}}$ (and hence the values of $\varepsilon$ across different values of $b$ are not immediately comparable).

Theorems & Definitions (39)

Definition 1: Definition I.\ref{['I-defTSatisfiesDP']}
Example 1
Proposition 1
proof
Definition 2
Example 2
Theorem 1
Remark 1
Theorem 2
Remark 2
...and 29 more

A Refreshment Stirred, Not Shaken (II): Invariant-Preserving Deployments of Differential Privacy for the US Decennial Census

TL;DR

Abstract

A Refreshment Stirred, Not Shaken (II): Invariant-Preserving Deployments of Differential Privacy for the US Decennial Census

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (39)