Table of Contents
Fetching ...

Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting

Christian Cianfarani, Aloni Cohen

Abstract

Data from the Decennial Census is published only after applying a disclosure avoidance system (DAS). Data users were shaken by the adoption of differential privacy in the 2020 DAS, a radical departure from past methods. The change raises the question of whether redistricting law permits, forbids, or requires taking account of the effect of disclosure avoidance. Such uncertainty creates legal risks for redistricters, as Alabama argued in a lawsuit seeking to prevent the 2020 DAS's deployment. We consider two redistricting settings in which a data user might be concerned about the impacts of privacy preserving noise: drawing equal population districts and litigating voting rights cases. What discrepancies arise if the user does nothing to account for disclosure avoidance? How might the user adapt her analyses to mitigate those discrepancies? We study these questions by comparing the official 2010 Redistricting Data to the 2010 Demonstration Data -- created using the 2020 DAS -- in an analysis of millions of algorithmically generated state legislative redistricting plans. In both settings, we observe that an analyst may come to incorrect conclusions if they do not account for noise. With minor adaptations, though, the underlying policy goals remain achievable: tweaking selection criteria enables a redistricter to draw balanced plans, and illustrative plans can still be used as evidence of the maximum number of majority-minority districts that are possible in a geography. At least for state legislatures, Alabama's claim that differential privacy ``inhibits a State's right to draw fair lines'' appears unfounded.

Understanding and Mitigating the Impacts of Differentially Private Census Data on State Level Redistricting

Abstract

Data from the Decennial Census is published only after applying a disclosure avoidance system (DAS). Data users were shaken by the adoption of differential privacy in the 2020 DAS, a radical departure from past methods. The change raises the question of whether redistricting law permits, forbids, or requires taking account of the effect of disclosure avoidance. Such uncertainty creates legal risks for redistricters, as Alabama argued in a lawsuit seeking to prevent the 2020 DAS's deployment. We consider two redistricting settings in which a data user might be concerned about the impacts of privacy preserving noise: drawing equal population districts and litigating voting rights cases. What discrepancies arise if the user does nothing to account for disclosure avoidance? How might the user adapt her analyses to mitigate those discrepancies? We study these questions by comparing the official 2010 Redistricting Data to the 2010 Demonstration Data -- created using the 2020 DAS -- in an analysis of millions of algorithmically generated state legislative redistricting plans. In both settings, we observe that an analyst may come to incorrect conclusions if they do not account for noise. With minor adaptations, though, the underlying policy goals remain achievable: tweaking selection criteria enables a redistricter to draw balanced plans, and illustrative plans can still be used as evidence of the maximum number of majority-minority districts that are possible in a geography. At least for state legislatures, Alabama's claim that differential privacy ``inhibits a State's right to draw fair lines'' appears unfounded.
Paper Structure (36 sections, 1 equation, 5 figures, 7 tables)

This paper contains 36 sections, 1 equation, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Discrepancy with offsets in the Louisiana state senate. We plot the fraction of plans exceeding intended population tolerance limit $\tau$, with various offsets $\Delta$ for the Louisiana state senate (i.e., discrepancy at $\tau$ with offset $\Delta$). Dots (solid lines) are computed using the DEMO and SWAP datasets with ensembles of 100,000 plans for each $\tau$ and $\Delta$ (see Sec. \ref{['sec:offsets']}). Squares (dashed lines) are computed from 100,000 samples from the statistical model of district populations and disclosure avoidance noise described in Sec. \ref{['sec:why-offsets-work']}.
  • Figure 2: The two components of population deviation in districts. Histograms of apparent population deviation in DEMO ($\mathsf{dev}_\mathsf{demo}(D) = (\mathsf{pop}_\mathsf{demo}(D) - \overline{p})(\overline{p})$) and the additional error from disclosure avoidance ($\mathsf{err}_{\mathsf{das}}(D) = (\mathsf{pop}_\mathsf{swap}(D) - \mathsf{pop}_\mathsf{demo}(D))/\overline{p}$) for each unique district included in an ensemble of Louisiana state senate plans sampled with a 5% population tolerance on the DEMO data. Note the very different scales on the horizontal axis. Together, these terms make up the population of a state-legislative district's population deviation. The dashed vertical lines behind the left histogram represent the maximum and minimum error values observed in the right histogram.
  • Figure 3: MMD discrepancies in the Georgia state house. We plot the number of majority-Black districts in plans sampled in our short bursts ensemble, which is designed to produce plans with many MMDs. Plans are grouped according to the number of majority-Black districts as measured using DEMO. Bars are colored to indicate the fraction of those plans which have the corresponding MMD discrepancy. Each ensemble contains 100,000 plans sampled with a 5% population deviation limit using the DEMO data.
  • Figure 4: MMD discrepancy rate by BVAP margin for the Georgia state house. We group the distinct districts short bursts ensemble according to the Black voting-age population margin. The height of each bar depicts the district-level MMD discrepancy rate for the group. Namely, the fraction of plans measured as majority Black according to one of DEMO and SWAP, but not both. Districts with small Black majorities in DEMO are much more likely to experience an MMD discrepancy than those with small non-Black majorities.
  • Figure 5: Histograms of % BVAP population for districts in two Georgia state house ensembles. The data include all distinct districts in our base (top) and short bursts (bottom) ensembles. The short bursts ensemble, which is designed to produce plans with many MMDs, samples districts with small Black majorities much more often than the base ensemble which ignores race.