Repetition effects in a Sequential Monte Carlo sampler
Sarah Cannon, Daryl DeFord, Moon Duchin
TL;DR
This work analyzes the prevalence of sample repetition in an SMC sampler for redistricting, revealing how descendancy diagrams and Markov-chain dynamics drive ancestor collisions. It shows that, under uniform descentancy, repetitions are bounded and computable via recursive sequences, while nonuniform weights and graph bottlenecks significantly increase redundancy; a weak CLT is established for SMC estimators, and a Controlled Repetition Sampler is proposed to understand and mitigate repetition, albeit with limits. The findings warn that real-world SMC ensembles can exhibit substantial duplication, especially for large numbers of districts, challenging the reliability of frequency claims and visual summaries unless very large or multiple runs are used. The work highlights practical implications for legal contexts, emphasizes the need for large, diverse samples, and suggests cautious interpretation of SMC outputs, along with accessible tooling in the Redist package.
Abstract
We investigate the prevalence of sample repetition in a Sequential Monte Carlo (SMC) method recently introduced for political redistricting.
