Online Statistical Inference in Decision-Making with Matrix Context
Qiyu Han, Will Wei Sun, Yichen Zhang
TL;DR
This work tackles online decision-making with matrix-valued contexts where the true parameter matrices are low-rank. It introduces a fully online framework that jointly performs online low-rank estimation via rank-$r$ SGD and online debiasing to enable valid inference under adaptive data collection, including entrywise parameter inference and optimal policy value inference. The key contributions are finite-sample SGD guarantees, asymptotic normality of debiased estimators, online variance estimation for confidence intervals, and a doubly robust online estimator for the optimal policy value with corresponding asymptotic theory. The results yield valid, data-efficient uncertainty quantification for both model parameters and policy quality in high-dimensional, matrix-context settings, with practical implications for healthcare, recommendations, and autonomous systems where timely statistical guarantees are crucial.
Abstract
The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual information can be rich and is often represented as a matrix. Moreover, while existing online decision algorithms mainly focus on reward maximization, less attention has been devoted to statistical inference. To address these gaps, in this work, we consider an online decision-making problem with a matrix context where the true model parameters have a low-rank structure. We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptive nature of the data collection process make this difficult: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To overcome these challenges, we introduce a new online debiasing procedure to simultaneously handle both sources of bias. Our inference framework encompasses both parameter inference and optimal policy value inference. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its convergence result, which are also of independent interest.
