Online Statistical Inference in Decision-Making with Matrix Context

Qiyu Han; Will Wei Sun; Yichen Zhang

Online Statistical Inference in Decision-Making with Matrix Context

Qiyu Han, Will Wei Sun, Yichen Zhang

TL;DR

This work tackles online decision-making with matrix-valued contexts where the true parameter matrices are low-rank. It introduces a fully online framework that jointly performs online low-rank estimation via rank-$r$ SGD and online debiasing to enable valid inference under adaptive data collection, including entrywise parameter inference and optimal policy value inference. The key contributions are finite-sample SGD guarantees, asymptotic normality of debiased estimators, online variance estimation for confidence intervals, and a doubly robust online estimator for the optimal policy value with corresponding asymptotic theory. The results yield valid, data-efficient uncertainty quantification for both model parameters and policy quality in high-dimensional, matrix-context settings, with practical implications for healthcare, recommendations, and autonomous systems where timely statistical guarantees are crucial.

Abstract

The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual information can be rich and is often represented as a matrix. Moreover, while existing online decision algorithms mainly focus on reward maximization, less attention has been devoted to statistical inference. To address these gaps, in this work, we consider an online decision-making problem with a matrix context where the true model parameters have a low-rank structure. We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptive nature of the data collection process make this difficult: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To overcome these challenges, we introduce a new online debiasing procedure to simultaneously handle both sources of bias. Our inference framework encompasses both parameter inference and optimal policy value inference. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its convergence result, which are also of independent interest.

Online Statistical Inference in Decision-Making with Matrix Context

TL;DR

SGD and online debiasing to enable valid inference under adaptive data collection, including entrywise parameter inference and optimal policy value inference. The key contributions are finite-sample SGD guarantees, asymptotic normality of debiased estimators, online variance estimation for confidence intervals, and a doubly robust online estimator for the optimal policy value with corresponding asymptotic theory. The results yield valid, data-efficient uncertainty quantification for both model parameters and policy quality in high-dimensional, matrix-context settings, with practical implications for healthcare, recommendations, and autonomous systems where timely statistical guarantees are crucial.

Abstract

Paper Structure (44 sections, 23 theorems, 255 equations, 8 figures, 3 tables, 7 algorithms)

This paper contains 44 sections, 23 theorems, 255 equations, 8 figures, 3 tables, 7 algorithms.

Introduction
Related Literature
Notations and Organization
Online Decision Making and Low-Rank Estimation
Sequential Decision Making
Online Low-Rank Estimation via SGD
Explanation of the Form of Stochastic Gradient
Convergence Analysis of Low-Rank Estimation
Parameter Inference
Online Debiasing Procedure
Asymptotic normality of Lg
Parameter Inference
Inference for Optimal Policy Value
Estimator for Optimal Policy Value
Asymptotic Normality
...and 29 more sections

Key Result

Lemma 2.1

The updating rules given by eq: practice update and eq: normalize update are equivalent in the sense that, at any time $t$, the updates $\mathcal{U}_{i,t}$, $\mathcal{V}_{i,t}$ from eq: practice update, and $\mathcal{U}'_{i,t}$ and $\mathcal{V}'_{i,t}$ from eq: normalize update, satisfy the relation

Figures (8)

Figure 1: An illustration of our online decision-making framework with matrix context.
Figure 2: The flow chart of the proposed sequential procedure for a total of $n$ iterations.
Figure 3: The empirical distributions of two biased estimators and our debiased method. The center of each empirical distribution is shown in the blue dashed line, and the standard normal curve is shown in red.
Figure 4: Empirical distribution of $\sqrt{n}(\widehat{m}^{(1)}_T - m^{(1)}_T)/\hat{\sigma}_1\hat{S}_1$ based on $5000$ independent trails for $T = e_1e_1^\top$. The red curve refers to the density of standard normal.
Figure 5: Empirical distribution of $\sqrt{n}(\widehat{m}^{(1)}_T - m^{(1)}_T)/\hat{\sigma}_1\hat{S}_1$ based on $5000$ independent trails for ranks $r=3,~5,~7$ and $T = e_1e_1^\top$.
...and 3 more figures

Theorems & Definitions (24)

Lemma 2.1: jin2016provable
Theorem 2.2
Remark 1
Theorem 3.1
Corollary 3.2
Theorem 3.3
Theorem 4.1
Theorem 4.2
Corollary A.1
Theorem A.2
...and 14 more

Online Statistical Inference in Decision-Making with Matrix Context

TL;DR

Abstract

Online Statistical Inference in Decision-Making with Matrix Context

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (24)