Optimistic Information Directed Sampling

Gergely Neu; Matteo Papini; Ludovic Schwartz

Optimistic Information Directed Sampling

Gergely Neu, Matteo Papini, Ludovic Schwartz

TL;DR

This work introduces Optimistic Information-Directed Sampling (OIDS), a framework that unifies Bayesian information-directed sampling and worst-case decision-estimation bounds for contextual bandits with parametric losses. By using an optimistic posterior to define surrogate regret and information gain, OIDS yields frequentist, problem-dependent guarantees comparable to Bayesian IDS without requiring priors. The authors present VOIDS and ROIDS variants, establish minimax and first-order regret bounds, extend to infinite parameter spaces and subgaussian losses, and connect the approach to DEC-based analyses while mitigating conservatism. The results broaden the applicability of information-directed strategies beyond Bayesian settings and offer a flexible tool for exploiting problem structure to achieve faster learning in sequential decision problems.

Abstract

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster, Kakade, Qian, and Rakhlin (2021) based on the decision-estimation coefficient. Drawing from both lines of work, we propose a algorithmic template called Optimistic Information-Directed Sampling and show that it can achieve instance-dependent regret guarantees similar to the ones achievable by the classic Bayesian IDS method, but with the major advantage of not requiring any Bayesian assumptions. The key technical innovation of our analysis is introducing an optimistic surrogate model for the regret and using it to define a frequentist version of the Information Ratio of Russo and Van Roy (2018), and a less conservative version of the Decision Estimation Coefficient of Foster et al. (2021). Keywords: Contextual bandits, information-directed sampling, decision estimation coefficient, first-order regret bounds.

Optimistic Information Directed Sampling

TL;DR

Abstract

Paper Structure (52 sections, 29 theorems, 113 equations, 1 algorithm)

This paper contains 52 sections, 29 theorems, 113 equations, 1 algorithm.

Introduction
Notation.
Preliminaries
Two competing theories of sequential decision making
The information ratio and Bayesian information-directed sampling
The decision-estimation coefficient and the estimations-to-decisions algorithm
Optimistic information-directed sampling
Main results
Worst-case bounds
First-order bounds
Infinite parameter spaces
Subgaussian losses
Analysis
The proof of Theorem \ref{['thm:regret_IR']}
The proof of Theorem \ref{['thm:regret_IR_FOB']}
...and 37 more sections

Key Result

Theorem 1

Assume $|\Theta| = N < \infty$ and let $\lambda > 0$ be arbitrary. Then, for any choice of prior $Q_1\in\Delta_{\Theta}$, the regret of any algorithm satisfies the following bound:

Theorems & Definitions (41)

Theorem 1
lemma 1
Corollary 1
Theorem 2
Theorem 3
Theorem 4
lemma 2
lemma 3
lemma 4
lemma 5
...and 31 more

Optimistic Information Directed Sampling

TL;DR

Abstract

Optimistic Information Directed Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (41)