Table of Contents
Fetching ...

Design-based finite-sample analysis for regression adjustment

Dogyoon Song

TL;DR

This paper develops a design-based, non-asymptotic framework for regression adjustment in randomized experiments, yielding finite-sample confidence intervals for the average treatment effect that remain informative even when the number of covariates $p$ is large relative to the sample size $n$. The approach uses a variance-adaptive swap-sensitivity analysis with a Doob martingale and Freedman’s inequality to control stochastic fluctuations, and Stein’s method of exchangeable pairs on the Johnson graph to bound design bias. Specializing to OLS with intercept (OLS-RA) reveals exact paired deletion–insertion identities and a covariate-geometry (leverages and cross-leverages) view of concentration and bias via rank-one updates. The authors provide oracle (population-level) quantities $(V^ullet,R^ullet,B^ullet)$ to compute instance-adaptive CIs and outline computable data-driven envelopes to approximate them from observed data. Empirical simulations show finite-sample CIs are valid but conservative, with RA offering potential gains in precision depending on covariate geometry; they discuss practical pathways to tighten bounds and extend the framework to broader designs and estimators.

Abstract

In randomized experiments, regression adjustment leverages covariates to improve the precision of average treatment effect (ATE) estimation without requiring a correctly specified outcome model. Although well understood in low-dimensional settings, its behavior in high-dimensional regimes -- where the number of covariates $p$ may exceed the number of observations $n$ -- remains underexplored. Furthermore, existing theory is largely asymptotic, providing limited guidance for finite-sample inference. We develop a design-based, non-asymptotic analysis of the regression-adjusted ATE estimator under complete randomization. Specifically, we derive finite-sample-valid confidence intervals with explicit, instance-adaptive widths that remain informative even when $p > n$. These intervals rely on oracle (population-level) quantities, and we also outline data-driven envelopes that are computable from observed data. Our approach hinges on a refined swap sensitivity analysis: stochastic fluctuation is controlled via a variance-adaptive Doob martingale and Freedman's inequality, while design bias is bounded using Stein's method of exchangeable pairs. The analysis suggests how covariate geometry governs concentration and bias through leverages and cross-leverages, shedding light on when and how regression adjustment improves on the difference-in-means baseline.

Design-based finite-sample analysis for regression adjustment

TL;DR

This paper develops a design-based, non-asymptotic framework for regression adjustment in randomized experiments, yielding finite-sample confidence intervals for the average treatment effect that remain informative even when the number of covariates is large relative to the sample size . The approach uses a variance-adaptive swap-sensitivity analysis with a Doob martingale and Freedman’s inequality to control stochastic fluctuations, and Stein’s method of exchangeable pairs on the Johnson graph to bound design bias. Specializing to OLS with intercept (OLS-RA) reveals exact paired deletion–insertion identities and a covariate-geometry (leverages and cross-leverages) view of concentration and bias via rank-one updates. The authors provide oracle (population-level) quantities to compute instance-adaptive CIs and outline computable data-driven envelopes to approximate them from observed data. Empirical simulations show finite-sample CIs are valid but conservative, with RA offering potential gains in precision depending on covariate geometry; they discuss practical pathways to tighten bounds and extend the framework to broader designs and estimators.

Abstract

In randomized experiments, regression adjustment leverages covariates to improve the precision of average treatment effect (ATE) estimation without requiring a correctly specified outcome model. Although well understood in low-dimensional settings, its behavior in high-dimensional regimes -- where the number of covariates may exceed the number of observations -- remains underexplored. Furthermore, existing theory is largely asymptotic, providing limited guidance for finite-sample inference. We develop a design-based, non-asymptotic analysis of the regression-adjusted ATE estimator under complete randomization. Specifically, we derive finite-sample-valid confidence intervals with explicit, instance-adaptive widths that remain informative even when . These intervals rely on oracle (population-level) quantities, and we also outline data-driven envelopes that are computable from observed data. Our approach hinges on a refined swap sensitivity analysis: stochastic fluctuation is controlled via a variance-adaptive Doob martingale and Freedman's inequality, while design bias is bounded using Stein's method of exchangeable pairs. The analysis suggests how covariate geometry governs concentration and bias through leverages and cross-leverages, shedding light on when and how regression adjustment improves on the difference-in-means baseline.

Paper Structure

This paper contains 95 sections, 16 theorems, 154 equations, 4 tables, 2 algorithms.

Key Result

Proposition 1

Let $X \in \mathbb{R}^{n \times p}$ and $y \in \mathbb{R}^n$, and let $(\hat{\mu}, \hat{\beta}) = \mathsf{OLS}(X, y)$. Then

Theorems & Definitions (53)

  • Remark 2.1
  • Definition 3.1
  • Remark 3.1
  • Proposition 1
  • Remark 3.2
  • Definition 3.2
  • Example 1
  • Remark 3.3
  • Proposition 2
  • Remark 3.4
  • ...and 43 more