Table of Contents
Fetching ...

States of Disarray: Cleaning Data for Gerrymandering Analysis

Ananya Agarwal, Fnu Alusi, Arbie Hsu, Arif Syraj, Ellen Veomett

TL;DR

The paper tackles the data preparation challenge for gerrymandering analysis using ensemble methods. It describes a data cleaning pipeline that aggregates census, election, and district data from the Redistricting Data Hub and VEST, coalesced with the maup library to produce precinct level jsons suitable for gerrychain and ReCom MCMC. The authors publicly release 22 state datasets and provide reproducibility artifacts to enable researchers to generate and analyze ensembles of redistricting maps after the 2020 census. They also discuss limitations such as partial automation across states and potential population deviations, and propose future directions like block level jsons for higher fidelity analyses.

Abstract

The mathematics of redistricting is an area of study that has exploded in recent years. In particular, many different research groups and expert witnesses in court cases have used outlier analysis to argue that a proposed map is a gerrymander. This outlier analysis relies on having an ensemble of potential redistricting maps against which the proposed map is compared. Arguably the most widely-accepted method of creating such an ensemble is to use a Markov Chain Monte Carlo (MCMC) process. This process requires that various pieces of data be gathered, cleaned, and coalesced into a single file that can be used as the seed of the MCMC process. In this article, we describe how we have begun this cleaning process for each state, and made the resulting data available for the public at https://github.com/eveomett-states . At the time of submission, we have data for 22 states available for researchers, students, and the general public to easily access and analyze. We will continue the data cleaning process for each state, and we hope that the availability of these datasets will both further research in this area, and increase the public's interest in and understanding of modern techniques to detect gerrymandering.

States of Disarray: Cleaning Data for Gerrymandering Analysis

TL;DR

The paper tackles the data preparation challenge for gerrymandering analysis using ensemble methods. It describes a data cleaning pipeline that aggregates census, election, and district data from the Redistricting Data Hub and VEST, coalesced with the maup library to produce precinct level jsons suitable for gerrychain and ReCom MCMC. The authors publicly release 22 state datasets and provide reproducibility artifacts to enable researchers to generate and analyze ensembles of redistricting maps after the 2020 census. They also discuss limitations such as partial automation across states and potential population deviations, and propose future directions like block level jsons for higher fidelity analyses.

Abstract

The mathematics of redistricting is an area of study that has exploded in recent years. In particular, many different research groups and expert witnesses in court cases have used outlier analysis to argue that a proposed map is a gerrymander. This outlier analysis relies on having an ensemble of potential redistricting maps against which the proposed map is compared. Arguably the most widely-accepted method of creating such an ensemble is to use a Markov Chain Monte Carlo (MCMC) process. This process requires that various pieces of data be gathered, cleaned, and coalesced into a single file that can be used as the seed of the MCMC process. In this article, we describe how we have begun this cleaning process for each state, and made the resulting data available for the public at https://github.com/eveomett-states . At the time of submission, we have data for 22 states available for researchers, students, and the general public to easily access and analyze. We will continue the data cleaning process for each state, and we hope that the availability of these datasets will both further research in this area, and increase the public's interest in and understanding of modern techniques to detect gerrymandering.

Paper Structure

This paper contains 7 sections, 3 figures.

Figures (3)

  • Figure 1: Histogram of the number of Democratic districts won, using a gerrychain ReCom ensemble with 20,000 steps. State is Pennsylvania, and election data is from the 2014 Gubernatorial election (election data used as a proxy for party preference). The red bar corresponds to the number of districts the Democratic party would have won using the congressional map enacted in 2012.
  • Figure 2: Fake state with four districts (blue, red, green, and yellow), and blocks within each district. Dual graph of all blocks is overlaid, nodes color-coded by district.
  • Figure 3: Green and blue districts from Figure \ref{['fig:state_districts']} merged, and spanning tree constructed. Black and red edges together are the spanning tree, with red edge being the chosen cut edge. The resulting new districting map is on the right.