Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood
Jiangrong Ouyang, Mingming Gong, Howard Bondell
TL;DR
The paper addresses off-policy inference in contextual bandits under finite samples by developing a Bayesian empirical likelihood framework. It constructs a joint empirical likelihood for a vector of policy values $\mathbf{v}$ and a separate empirical likelihood for the policy-value difference $d=v_{new}-v_{baseline}$, using estimating equations and the constraint $\mathbb{E}[\mathbf{w}_a]=\mathbf{1}$. Key contributions include (i) a joint EL for multiple policies, (ii) EL for policy-value differences with a projection to one dimension, (iii) an adaptive sub-support and grid-based posterior to enable accurate finite-sample inference, and (iv) validation on Monte Carlo simulations and an adolescent BMI dataset. The approach yields calibrated uncertainty quantification for policy evaluation and flexible comparisons that support decision-making in clinical and recommender-system contexts, especially in small-sample regimes.
Abstract
Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.
