Efficient Contextual Bandits with Uninformed Feedback Graphs

Mengxiao Zhang; Yuheng Zhang; Haipeng Luo; Paul Mineiro

Efficient Contextual Bandits with Uninformed Feedback Graphs

Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

TL;DR

This work addresses contextual bandits with directed feedback graphs in an uninformed setting, where the feedback graph is revealed only after decisions or not at all. It introduces SquareCB.UG, a reduction to online regression that learns both losses and graphs using a log-loss regression oracle, and analyzes both partially and fully revealed graph regimes. The authors prove sublinear regret bounds that depend on the independence-number of the graph class, improving from a worst-case \\alpha({\\mathcal G}) to an adaptive \\alpha_t in the fully revealed setting, and demonstrate empirical gains in bidding tasks on synthetic and real data. The approach leverages a DEC-based minimax framework and shows that log-loss graph learning is crucial for favorable guarantees, with practical performance corroborated by experiments. This advances efficient contextual bandits by enabling robust learning under uninformed feedback structures in realistic applications.

Abstract

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual version of this problem and proposes an efficient and optimal algorithm via a reduction to online regression. However, their algorithm crucially relies on seeing the feedback graph before making each decision, while in many applications, the feedback graph is uninformed, meaning that it is either only revealed after the learner makes her decision or even never fully revealed at all. This work develops the first contextual algorithm for such uninformed settings, via an efficient reduction to online regression over both the losses and the graphs. Importantly, we show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees. We also demonstrate the empirical effectiveness of our algorithm on a bidding application using both synthetic and real-world data.

Efficient Contextual Bandits with Uninformed Feedback Graphs

TL;DR

Abstract

Paper Structure (29 sections, 14 theorems, 58 equations, 2 figures)

This paper contains 29 sections, 14 theorems, 58 equations, 2 figures.

Introduction
Contributions.
Related works.
Preliminary
Realizability and oracle assumptions.
Independence number.
Algorithms and Regret Guarantees
Analysis
Analysis for Partially Revealed Graphs
Analysis for Fully Revealed Graphs
Experiments
Regression oracles.
Implementation of $\mathsf{SquareCB.UG}$.
Empirical Results on Synthetic Data
Data.
...and 14 more sections

Key Result

Lemma 2.4

Suppose for each $t$ and $(i,j,b)\in S_t$, we have $\mathbb{E}[b|x_t] = g^\star(x_t, i,j)$. Then oracle $\mathsf{AlgLog}\xspace$ guarantees:

Figures (2)

Figure 1: Comparison among $\mathsf{SquareCB.UG}$, $\mathsf{SquareCB}$, greedy, and a trivial baseline on one synthetic dataset with diverse contexts (top figure) and another one with poor diversity (bottom figure).
Figure 2: Comparison among $\mathsf{SquareCB.UG}$, $\mathsf{SquareCB}$, greedy, and a trivial baseline on a real auction dataset.

Theorems & Definitions (24)

Lemma 2.4: Proposition 5 of foster2021efficient
Theorem 3.1
Theorem 3.2
Theorem 4.1
Lemma 4.1
Lemma 4.2
proof
proof : Proof of Lemma \ref{['lem:dec_translation']}
Lemma 4.2
proof : Proof of Theorem \ref{['thm:partial']}
...and 14 more

Efficient Contextual Bandits with Uninformed Feedback Graphs

TL;DR

Abstract

Efficient Contextual Bandits with Uninformed Feedback Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (24)