Table of Contents
Fetching ...

Distributionally balanced sampling designs via minimum tactical configurations

Anton Grafström, Wilmer Prentius

Abstract

Distributionally balanced sampling designs are low-discrepancy probability designs obtained by minimizing the expected discrepancy between the auxiliary-variable distribution of a random sample and the target population distribution. Existing constructions rely on circular population sequences, which restrict the design space by forcing samples to be contiguous blocks of a sequence. We propose a new construction based on minimum tactical configurations that removes this topological constraint. The resulting designs are fixed-size, have equal inclusion probabilities, and belong to the class with minimum feasible configuration size. We develop both a simple initialization valid for arbitrary population and sample sizes and a spatial initialization that yields a lower initial expected discrepancy, together with a simulated annealing algorithm for optimization within this class. In simulations and empirical examples, the proposed method outperforms state-of-the-art alternatives in terms of distributional fit, balance, and spatial spread.

Distributionally balanced sampling designs via minimum tactical configurations

Abstract

Distributionally balanced sampling designs are low-discrepancy probability designs obtained by minimizing the expected discrepancy between the auxiliary-variable distribution of a random sample and the target population distribution. Existing constructions rely on circular population sequences, which restrict the design space by forcing samples to be contiguous blocks of a sequence. We propose a new construction based on minimum tactical configurations that removes this topological constraint. The resulting designs are fixed-size, have equal inclusion probabilities, and belong to the class with minimum feasible configuration size. We develop both a simple initialization valid for arbitrary population and sample sizes and a spatial initialization that yields a lower initial expected discrepancy, together with a simulated annealing algorithm for optimization within this class. In simulations and empirical examples, the proposed method outperforms state-of-the-art alternatives in terms of distributional fit, balance, and spatial spread.

Paper Structure

This paper contains 17 sections, 2 theorems, 15 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

Let $g = \gcd(N,n)$. The size of a tactical configuration must be at least $M = N/g$, with each unit appearing exactly $c = n/g$ times.

Figures (3)

  • Figure 1: Illustration of an admissible swap between a unit $i$ in $\bm{d}_k$ ($d_{ik}=1$) and a unit $j$ in $\bm{d}_l$ ($d_{jl}=1$). The swap is admissible as $d_{il}=0$ and $d_{jk}=0$ (before the swap).
  • Figure 2: Expected energy distance of Circular DBD (solid gray) and DBD-TC (dashed green) at iterations up to 10M, $\pm 2$ standard deviations, for a fixed-sized sample of $n=50$ and $p=5$ auxiliary variables (i.e. $M=20$ samples).
  • Figure 3: Distributions of the different metrics under three designs with sample size $n=50$. Colors represent the designs: green is DBD-TC, gray is circular DBD ($10^7$ iterations), orange is LCube, blue is LPM. First row: energy distance. Second row: the local balance measure. Third row: spatial balance. Fourth row: balance deviation. Columns: number of auxiliary variables.

Theorems & Definitions (12)

  • Definition 1: Tactical configuration
  • Definition 2: Tactical configuration sampling design
  • Definition 3: Minimum tactical configuration
  • Proposition 1: Theoretical bound on configuration size
  • proof
  • Lemma 1
  • proof
  • Example 1: Constructive initialization
  • Example 2: Decay of the expected energy distance
  • Example 3: Comparisons with some existing designs
  • ...and 2 more