Table of Contents
Fetching ...

Privacy Violations in Election Results

Shiro Kuriwaki, Jeffrey B. Lewis, Michael Morse

TL;DR

This paper defines vote revelation as linking a vote on an anonymous ballot to a voter's name in the public voter file and empirically assesses privacy costs in granular election reporting. Using Maricopa County's 2020 general election as a case study, it quantifies revelation under progressively granular reporting: 0.0009% at precinct level, 0.05% at precinct×method, and 0.17% when releasing individual ballots, with most risk concentrated in provisional and federal-only ballots. The authors develop a formal framework based on $\ell$-diversity to characterize public and local revelation and discuss ex-post remedies (redaction, extending redaction, noising) and ex-ante design strategies (districting, reporting-unit adjustments, and voter-file limitations) to mitigate risk. They conclude that while greater transparency offers benefits for fraud detection and trust, it inevitably creates privacy costs, which can be mitigated through careful reporting design and selective data treatment, though no perfect solution exists within current methods. The work provides a rigorous framework and empirical benchmarks to inform policy debates on how granular election results should be reported to balance transparency with the secret ballot.

Abstract

After an election, should election officials release a copy of each anonymous ballot? Some policymakers have championed public disclosure to counter distrust, but others worry that it might undermine ballot secrecy. We introduce the term vote revelation to refer to the linkage of a vote on an anonymous ballot to the voter's name in the public voter file, and detail how such revelation could theoretically occur. Using the 2020 election in Maricopa County, Arizona, as a case study, we show that the release of individual ballot records would lead to no revelation of any vote choice for 99.83% of voters as compared to 99.95% under Maricopa's current practice of reporting aggregate results by precinct and method of voting. Further, revelation is overwhelmingly concentrated among the few voters who cast provisional ballots or federal-only ballots. We discuss the potential benefits of transparency, compare remedies to reduce or eliminate privacy violations, and highlight the privacy-transparency tradeoff inherent in all election reporting.

Privacy Violations in Election Results

TL;DR

This paper defines vote revelation as linking a vote on an anonymous ballot to a voter's name in the public voter file and empirically assesses privacy costs in granular election reporting. Using Maricopa County's 2020 general election as a case study, it quantifies revelation under progressively granular reporting: 0.0009% at precinct level, 0.05% at precinct×method, and 0.17% when releasing individual ballots, with most risk concentrated in provisional and federal-only ballots. The authors develop a formal framework based on -diversity to characterize public and local revelation and discuss ex-post remedies (redaction, extending redaction, noising) and ex-ante design strategies (districting, reporting-unit adjustments, and voter-file limitations) to mitigate risk. They conclude that while greater transparency offers benefits for fraud detection and trust, it inevitably creates privacy costs, which can be mitigated through careful reporting design and selective data treatment, though no perfect solution exists within current methods. The work provides a rigorous framework and empirical benchmarks to inform policy debates on how granular election results should be reported to balance transparency with the secret ballot.

Abstract

After an election, should election officials release a copy of each anonymous ballot? Some policymakers have championed public disclosure to counter distrust, but others worry that it might undermine ballot secrecy. We introduce the term vote revelation to refer to the linkage of a vote on an anonymous ballot to the voter's name in the public voter file, and detail how such revelation could theoretically occur. Using the 2020 election in Maricopa County, Arizona, as a case study, we show that the release of individual ballot records would lead to no revelation of any vote choice for 99.83% of voters as compared to 99.95% under Maricopa's current practice of reporting aggregate results by precinct and method of voting. Further, revelation is overwhelmingly concentrated among the few voters who cast provisional ballots or federal-only ballots. We discuss the potential benefits of transparency, compare remedies to reduce or eliminate privacy violations, and highlight the privacy-transparency tradeoff inherent in all election reporting.
Paper Structure (64 sections, 4 equations, 12 figures, 8 tables)

This paper contains 64 sections, 4 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Schematic of Potential Privacy Violation by Vote Revelation. A Venn diagram of how quasi-identifiers could link vote choice to personally identifying information.
  • Figure 2: How Unanimous Election Results Reveal Votes. In this example, 30 voters vote in two contests (President and a tax referendum) and the reporting units for results are defined by precinct $\times$ vote method $\times$ ballot style.
  • Figure 3: Contests with More Revelation. A boxplot showing the fraction of public revelations in the ballot-level reporting regime, excluding federal-only ballots. Each point represents a contest. The solid bars indicate the median, the box indicates the first and third quartile, and the whiskers extend to 1.5 multiplied by the interquartile range.
  • Figure 4: Revelation of unpopular vote choices. For each revealed vote choice at the precinct-level or ballot-level, we display what percent of the revealed voter's neighbors who share that vote choice, where neighbors are defined by geographical distance. Solid lines show average agreement by revealed candidate.
  • Figure 5: Tradeoffs from redaction. Number of affected ballots from a hypothetical policy of redacting ballots from reporting units with $k$ or fewer voters. $k = 0$ indicates the status quo of no redaction.
  • ...and 7 more figures