Combining Experimental and Historical Data for Policy Evaluation
Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu
TL;DR
The paper tackles policy evaluation when multiple data sources are available, notably an experimental dataset with two arms and a historical control dataset. It introduces two linear-weighted estimators that combine base estimators from experimental and historical data, with weights chosen to minimize the mean squared error and a pessimistic variant to gain robustness under reward shifts between datasets. The authors establish non-asymptotic MSE bounds, oracle and robustness properties across a spectrum of reward shift regimes, and demonstrate superior empirical performance on simulated and real ridesharing data, as well as sequential decision-making settings. The contribution advances data integration for causal learning by accommodating distributional shifts and providing practical guidance on estimator choice in different regimes, with implications for offline policy evaluation and sequential RL contexts.
Abstract
This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.
