Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
Ojash Neopane, Aaditya Ramdas, Aarti Singh
TL;DR
This work addresses adaptive estimation of the Average Treatment Effect (ATE) under finite-sample, nonasymptotic conditions by leveraging the asymptotically optimal Augmented Inverse Propensity Weighting (AIPW) estimator. The authors introduce Optimistic Policy Tracking (OPTrack), a bandit-inspired algorithm that uses confidence sequences to guide allocations toward a Neyman-optimal regime while maintaining exploration. They prove a logarithmic Neyman regret bound and demonstrate empirical gains over prior methods, particularly in small-sample regimes relevant to clinical trials and A/B testing. The results connect optimism in sequential decision-making with adaptive causal inference, offering both theoretical guarantees and practical improvements. The work lays groundwork for extending adaptive strategies to covariate-rich settings, multiple arms, and reinforcement-learning-like interaction protocols.
Abstract
Estimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory have paved the way for adaptive methods that can enhance the power of downstream inference. Despite these advances, progress in understanding and developing adaptive algorithms remains in its early stages. Existing work either focus on asymptotic analyses that overlook exploration-exploitation tradeoffs relevant in finite-sample regimes or rely on simpler but suboptimal estimators. In this work, we address these limitations by studying adaptive sampling procedures that take advantage of the asymptotically optimal Augmented Inverse Probability Weighting (AIPW) estimator. Our analysis uncovers challenges obscured by asymptotic approaches and introduces a novel algorithmic design principle reminiscent of optimism in multiarmed bandits. This principled approach enables our algorithm to achieve significant theoretical and empirical gains compared to prior methods. Our findings mark a step forward in advancing adaptive causal inference methods in theory and practice.
