Spatially Randomized Designs Can Enhance Policy Evaluation

Ying Yang; Chengchun Shi; Fang Yao; Shouyang Wang; Hongtu Zhu

Spatially Randomized Designs Can Enhance Policy Evaluation

Ying Yang, Chengchun Shi, Fang Yao, Shouyang Wang, Hongtu Zhu

TL;DR

The paper tackles policy evaluation under spatial interference by introducing spatially randomized designs (global, individual, cluster) and providing parametric, semiparametric, and dynamic estimation frameworks. It derives MSE and power results, showing that spacerandomization can substantially improve estimator efficiency and testing power, with the optimal cluster size scaling as $c^*\asymp r$. The work introduces both traditional and doubly robust methods, including a dynamic DRL approach with mean-field approximations to mitigate the high-dimensionality of spatio-temporal data. Empirical validation through simulations and a real ride-hailing dataset demonstrates consistent performance gains over global designs, supporting practical adoption in large-scale online experiments. The findings offer actionable guidance for designing efficient A/B tests in networks with interference, with implications for ride-sharing, e-commerce, and digital advertising platforms.

Abstract

This article studies the benefits of using spatially randomized experimental designs which partition the experimental area into distinct, non-overlapping units with treatments assigned randomly. Such designs offer improved policy evaluation in online experiments by providing more precise policy value estimators and more effective A/B testing algorithms than traditional global designs, which apply the same treatment across all units simultaneously. We examine both parametric and nonparametric methods for estimating and inferring policy values based on this randomized approach. Our analysis includes evaluating the mean squared error of the treatment effect estimator and the statistical power of the associated tests. Additionally, we extend our findings to experiments with spatio-temporal dependencies, where treatments are allocated sequentially over time, and account for potential temporal carryover effects. Our theoretical insights are supported by comprehensive numerical experiments.

Spatially Randomized Designs Can Enhance Policy Evaluation

TL;DR

. The work introduces both traditional and doubly robust methods, including a dynamic DRL approach with mean-field approximations to mitigate the high-dimensionality of spatio-temporal data. Empirical validation through simulations and a real ride-hailing dataset demonstrates consistent performance gains over global designs, supporting practical adoption in large-scale online experiments. The findings offer actionable guidance for designing efficient A/B tests in networks with interference, with implications for ride-sharing, e-commerce, and digital advertising platforms.

Abstract

Paper Structure (17 sections, 7 theorems, 35 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 17 sections, 7 theorems, 35 equations, 6 figures, 3 tables, 2 algorithms.

Introduction
Related work
Outline of the paper
Nondynamic Setting
Problem formulation
Parametric and semiparametric learning
Estimation accuracy in the nondynamic setting
Testing power in the nondynamic setting
Dynamic Setting
Parametric and semiparametric modeling
Estimation accuracy in the dynamic setting
Numerical Experiments
Simulation of the nondynamic setting
Simulation of the dynamic setting
Real data based simulation
...and 2 more sections

Key Result

Theorem 2.2

Suppose that CA holds. Set $p=p^{(j)}=p_\iota=0.5$ for all $1\le \iota\le R$, $1\le j\le m$. Let $r=\max_{\iota} n_{\iota}$ and $\nu=\sum_{\iota=1}^R \sum_{\iota'=1}^R \mathbb{V}_{\iota \iota'}/\sum_{\iota=1}^R \mathbb{V}_{\iota \iota}$, it holds that where $a_N\lesssim b_N$ means $a_N/b_N=1+o(1)$. Further suppose that Assumption assump:omega holds and let $\mathcal{N}_{\mathcal{C}_j}=\cup_{\iota

Figures (6)

Figure 1: Business metrics from a city over 40 days, including drivers’ total income, the number of requests, and drivers’ total online time. Each curve represents data for a single day, with the horizontal axis corresponding to 24 hours. The values are scaled to preserve privacy.
Figure 2: Panels (a), (b), and (c) illustrate examples of clusters with interference neighbors being the adjacent units, each differing in the values of $c$ and $r$. Specifically, these panels present clusters with $c=9$ and $r=3, 4,$ and $6$, respectively. In each panel, the central cluster is emphasized in yellow, with its adjacent units outlined by bold, darker edges. Upon examination, we find that the respective values of $\omega$ and the cardinality of $|\mathcal{N}_{\mathcal{C}_j}|$ are $\omega=1, 3, 3$ and $|\mathcal{N}_{\mathcal{C}_j}|=9, 12, 14$ for panels (a), (b), and (c), respectively.
Figure 3: Simulation layout showcasing unital patterns. Each row illustrates configurations with a maximum number of neighbors per unit, specifically $r=$3, 4, and 6. The columns indicate varying total counts of units, with $R=$36, 81, and 144. Within each panel, distinct colors denote separate clusters, leading to varying cluster counts of $m=$4, 9, and 16.
Figure 4: Rejection probabilities in the parametric and semiparametric models under a non-dynamic setup. The horizontal axis represents the treatment's relative improvement. The red lines depict the individual-randomized design, the blue lines represent the cluster-randomized design, and the black lines denote the global randomized design. The line styles-solid, dashed, and dotted-correspond to the number of units $R=144, 81,$ and $36,$ respectively. The figure is organized into three rows and three columns of panels, representing different values of $r$ (6, 4, and 3) and $\rho$ (0.9, 0.6, and 0.3), respectively.
Figure 5: Rejection probability in the parametric and nonparametric models with $M=12$ of dynamic setting under different relative improvements of the new policy. The red, blue and black lines represent to the individual-, cluster- and global-randomized designs, with $R=144,81,36$ plotted in solid, dashed and dotted lines, respectively. The three rows of panels correspond to $r=6,4,3$, and the three columns correspond to $\rho=0.9,0.6,0.3$, respectively.
...and 1 more figures

Theorems & Definitions (9)

Theorem 2.2
Theorem 2.4
Remark 1
Remark 2
Theorem 2.5
Corollary 2.6
Proposition 3.1
Theorem 3.2
Theorem 3.4

Spatially Randomized Designs Can Enhance Policy Evaluation

TL;DR

Abstract

Spatially Randomized Designs Can Enhance Policy Evaluation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)