Adaptive Principal Component Regression with Applications to Panel Data
Anish Agarwal, Keegan Harris, Justin Whitehouse, Zhiwei Steven Wu
TL;DR
The paper develops time-uniform finite-sample guarantees for online principal component regression under adaptively collected, noisy covariates, improving prior fixed-design results by leveraging martingale concentration and self-normalized techniques. By introducing an empirical signal-to-noise ratio and a data-geometry measure, it derives bounds showing that the PCR estimator error scales as $\widetilde{O}\left(\frac{1}{\mathrm{snr}_n(a)^2}\kappa(\mathbf{X}_n(a))^2\right)$, without relying on $\ell_1$ sparsity. The authors apply these results to panel-data causal inference, proposing adaptive synthetic control and learning-to-treat algorithms with regret guarantees and demonstrating empirical gains over baselines that ignore measurement noise. This work enables reliable counterfactual estimation and adaptive intervention design in sequential, noisy environments, with potential impact on econometrics, online experimentation, and privacy-aware analyses.
Abstract
Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. We demonstrate the usefulness of our bounds by applying them to the domain of panel data, a ubiquitous setting in econometrics and statistics. As our first application, we provide a framework for experiment design in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic control and synthetic interventions frameworks, where data is collected via an adaptive intervention assignment policy. Our second application is a procedure for learning such an intervention assignment policy in a setting where units arrive sequentially to be treated. In addition to providing theoretical performance guarantees (as measured by regret), we show that our method empirically outperforms a baseline which does not leverage error-in-variables regression.
