Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

Jiangrong Ouyang; Mingming Gong; Howard Bondell

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

Jiangrong Ouyang, Mingming Gong, Howard Bondell

TL;DR

The paper addresses off-policy inference in contextual bandits under finite samples by developing a Bayesian empirical likelihood framework. It constructs a joint empirical likelihood for a vector of policy values $\mathbf{v}$ and a separate empirical likelihood for the policy-value difference $d=v_{new}-v_{baseline}$, using estimating equations and the constraint $\mathbb{E}[\mathbf{w}_a]=\mathbf{1}$. Key contributions include (i) a joint EL for multiple policies, (ii) EL for policy-value differences with a projection to one dimension, (iii) an adaptive sub-support and grid-based posterior to enable accurate finite-sample inference, and (iv) validation on Monte Carlo simulations and an adolescent BMI dataset. The approach yields calibrated uncertainty quantification for policy evaluation and flexible comparisons that support decision-making in clinical and recommender-system contexts, especially in small-sample regimes.

Abstract

Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

TL;DR

and a separate empirical likelihood for the policy-value difference

, using estimating equations and the constraint

. Key contributions include (i) a joint EL for multiple policies, (ii) EL for policy-value differences with a projection to one dimension, (iii) an adaptive sub-support and grid-based posterior to enable accurate finite-sample inference, and (iv) validation on Monte Carlo simulations and an adolescent BMI dataset. The approach yields calibrated uncertainty quantification for policy evaluation and flexible comparisons that support decision-making in clinical and recommender-system contexts, especially in small-sample regimes.

Abstract

Paper Structure (15 sections, 89 equations, 3 figures, 2 tables)

This paper contains 15 sections, 89 equations, 3 figures, 2 tables.

Introduction
Background and Notation
Methodology
Joint Empirical Likelihood for Multiple Policies
Empirical Likelihood for Policy Value Difference
Computational Method for Bayesian Inference
Experiments
Single Policy Inference
Policy Comparison
Application to Body Mass Index Data
Discussion
Joint Empirical Likelihood for Multiple Policies
Maximum Empirical Likelihood Estimator
Empirical Likelihood for Policy Value Difference
Adaptive Sub-Support

Figures (3)

Figure 1: Coverage probabilities for the Wilks' and the HPD intervals.
Figure 2: Interval widths for the case where the Wilks' intervals suffer from undercoverage.
Figure 4: Policy comparison using the joint posterior distribution for $\boldsymbol{\mathbf{v}} = (\mathrm{v_{baseline}, v_{new}})^\top$ (top) and the univariate posterior distribution for $d = \mathrm{v_{new} - v_{baseline}}$ (bottom).

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

TL;DR

Abstract

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

Authors

TL;DR

Abstract

Table of Contents

Figures (3)