Randomized Confidence Bounds for Stochastic Partial Monitoring

Maxime Heuillet; Ola Ahmad; Audrey Durand

Randomized Confidence Bounds for Stochastic Partial Monitoring

Maxime Heuillet, Ola Ahmad, Audrey Durand

TL;DR

This work addresses online learning under partial feedback by advancing randomized confidence bounds for stochastic partial monitoring. Building on CBP, it introduces RandCBP and RandCBPsideStar, preserving sublinear regret in easy PM games and delivering the first regret guarantees for hard contextual PM. The methods inject controlled randomness via discretized Gaussian sampling to the confidence bounds, achieving comparable performance to deterministic CBP while improving empirical behavior in hard settings. The authors also extend the framework to linear contextual PM with RandCBPsideStar, derive context-sensitive regret bounds, and validate the approach with comprehensive experiments and a real-world use-case for monitoring deployed classifiers. Reproducibility resources and extensive analyses support practical adoption of PM in real-world sequential decision problems.

Abstract

The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds. We also extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPsidestar strategies have favorable performance against state-of-the-art baselines in multiple PM games. To advocate for the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.

Randomized Confidence Bounds for Stochastic Partial Monitoring

TL;DR

Abstract

Paper Structure (25 sections, 2 theorems, 15 equations, 2 algorithms)

This paper contains 25 sections, 2 theorems, 15 equations, 2 algorithms.

Introduction
Contributions
Preliminaries on Partial Monitoring
Finite stochastic partial monitoring games
Non-contextual setting
Contextual setting
Other relevant settings
Structure of partial monitoring games
Difference between easy and hard games
Towards a Randomized CBP
The CBP Strategy
Successive elimination confidence bounds
Connecting outcome and feedback distributions
Exploration and exploitation in CBP
Instantiating RandCBP
...and 10 more sections

Key Result

Theorem 3.1

Consider the interval $[A,B]$, with $B = \sqrt{\alpha \log(t) }$ and $A\leq0$. Set the randomization over $K$ bins with a probability $\epsilon$ on the tail and a standard deviation $\sigma$. Set $f(t) = \alpha^{1/3} t^{2/3} \log(t)^{1/3}$, $\eta_a = W_a^{2/3}$ and $\alpha>1$. On easy games, RandCBP with $\mathcal{V} = \bigcup_{i,j \in \mathcal{N}} V_{ij}$ and $g_k$ being game dependent constants.

Theorems & Definitions (11)

Definition 2.1: Cell decomposition, bartokICML2012
Definition 2.2: Signal matrix, bartokICML2012
Definition 3.1: Neighbor pairs, bartokICML2012
Definition 3.2: Observer set, bartokICML2012
Definition 3.3: Observer vectors, bartokICML2012
Definition 3.4: Underplayed actions, bartokICML2012
Definition 3.5: Neighbor action set, bartokICML2012
Theorem 3.1
Remark 4.1
Definition 4.1: Underplayed actions (contextual case)
...and 1 more

Randomized Confidence Bounds for Stochastic Partial Monitoring

TL;DR

Abstract

Randomized Confidence Bounds for Stochastic Partial Monitoring

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (11)