Repetition effects in a Sequential Monte Carlo sampler

Sarah Cannon; Daryl DeFord; Moon Duchin

Repetition effects in a Sequential Monte Carlo sampler

Sarah Cannon, Daryl DeFord, Moon Duchin

TL;DR

This work analyzes the prevalence of sample repetition in an SMC sampler for redistricting, revealing how descendancy diagrams and Markov-chain dynamics drive ancestor collisions. It shows that, under uniform descentancy, repetitions are bounded and computable via recursive sequences, while nonuniform weights and graph bottlenecks significantly increase redundancy; a weak CLT is established for SMC estimators, and a Controlled Repetition Sampler is proposed to understand and mitigate repetition, albeit with limits. The findings warn that real-world SMC ensembles can exhibit substantial duplication, especially for large numbers of districts, challenging the reliability of frequency claims and visual summaries unless very large or multiple runs are used. The work highlights practical implications for legal contexts, emphasizes the need for large, diverse samples, and suggests cautious interpretation of SMC outputs, along with accessible tooling in the Redist package.

Abstract

We investigate the prevalence of sample repetition in a Sequential Monte Carlo (SMC) method recently introduced for political redistricting.

Repetition effects in a Sequential Monte Carlo sampler

TL;DR

Abstract

We investigate the prevalence of sample repetition in a Sequential Monte Carlo (SMC) method recently introduced for political redistricting.

Paper Structure (9 sections, 6 theorems, 24 equations, 6 figures, 3 tables)

This paper contains 9 sections, 6 theorems, 24 equations, 6 figures, 3 tables.

Introduction
Motivating questions
Structure of descendancy diagrams
Setup
Limiting behavior
Non-uniform weights
Convergence guarantees and diagnostics
Discussion
Weak CLT for controlled repetition sampler

Key Result

Lemma 2.1

If a given generation $i$ has $1\leq t\leq S$ active nodes, then the expected number of ancestors in the generation immediately above (generation $i+1$) is $S-S(1-\frac{1}{S})^t$. The probability that there are exactly $v$ activated nodes in generation $i+1$ when there are $t$ activated nodes in gen

Figures (6)

Figure 1: Simple example of partitioning a $4\times 4$ grid into four districts. The adjacency pattern of the grid is the graph $G$, the number of districts is $k=4$, and the size of the sample is $S=4$. At right, the process is abstracted into a descendancy diagram. The district marked last (green) does not have a row of the diagram, because it is made up of area left over after the third district (orange) is marked.
Figure 2: These two figures show structures we call descendancy diagrams. The bottom row is labeled as generation 1 in each case, increasing in index with each layer until generation $k-1$ at the top. Each of these two diagrams has $A(D)=2$, meaning that there are two top-level ancestors from which all members of the bottom generation are descended.
Figure 3: The $S=12,k=11$ example is repeated, now with the diagram nodes decorated by their number of final-generation descendants. High numbers appearing on low levels are markers of extreme redundancy.
Figure 4: If the distribution of weights is uniform, these plots show the expected number of surviving ancestors (districts drawn in the initial generation that appear in the final sample of plans) as $k$ grows, for $S=5,20,50$. The horizontal axis is $k$ in each plot and the vertical axis is the expected number of surviving ancestors. Green stars are precise outputs from the Markov chain expression, compared to the $a_k S$ values in orange and the $b_k S$ values in purple (each interpolated by a curve). In these small experiments, it is always true that $b_k S \le A(S,k)\le a_k S$, and that $a_k S\approx A(S,k)$ is a very good approximation.
Figure 5: Truncation of a long-tailed histogram of weights in a descendancy diagram on state Senate districts in New Mexico ($k=42, i=10, S=5000$, default settings). If weights were uniform, the distribution of weights would be concentrated at the red line. Instead, when drawing the 33rd district in this SMC process, some 32-district partial plans are over 100 times likelier than others to be chosen.
...and 1 more figures

Theorems & Definitions (14)

Lemma 2.1: One-step probabilities
proof
Lemma 2.2
proof
Proposition 2.3
proof
Remark 2.4
Remark 2.5
Lemma 2.6: Uniform descendancy minimizes ancestor collapse
proof
...and 4 more

Repetition effects in a Sequential Monte Carlo sampler

TL;DR

Abstract

Repetition effects in a Sequential Monte Carlo sampler

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (14)