Table of Contents
Fetching ...

PAC-Bayes Meets Online Contextual Optimization

Zhuojun Xie, Adam Abdin, Yiping Fang

TL;DR

This work addresses online contextual optimization under full-information feedback by introducing Bayesian online contextual optimization (BOCO), which unifies general Bayesian updating with PAC-Bayes theory to produce a Gibbs posterior for online decision making. The method uses a pushforward-based aggregated predictor $m(x_t;\pi_t)$ and enforces gradient-free learning via a sequential Monte Carlo scheme with Liu–West rejuvenation, enabling applications to nondifferentiable problems. Theoretical guarantees show an $O(\sqrt{T})$ regret bound for bounded and mixable losses, and the practical SMC algorithm provides gradient-free, anytime updates. Empirically, BOCO outperforms gradient-based and deterministic baselines on a nondifferentiable knapsack with uncertain weights, achieving higher and more stable rewards and feasibility, particularly early in the data stream, highlighting the approach’s robustness and practicality for online decision-making under uncertainty.

Abstract

The predict-then-optimize paradigm bridges online learning and contextual optimization in dynamic environments. Previous works have investigated the sequential updating of predictors using feedback from downstream decisions to minimize regret in the full-information settings. However, existing approaches are predominantly frequentist, rely heavily on gradient-based strategies, and employ deterministic predictors that could yield high variance in practice despite their asymptotic guarantees. This work introduces, to the best of our knowledge, the first Bayesian online contextual optimization framework. Grounded in PAC-Bayes theory and general Bayesian updating principles, our framework achieves $\mathcal{O}(\sqrt{T})$ regret for bounded and mixable losses via a Gibbs posterior, eliminates the dependence on gradients through sequential Monte Carlo samplers, and thereby accommodates nondifferentiable problems. Theoretical developments and numerical experiments substantiate our claims.

PAC-Bayes Meets Online Contextual Optimization

TL;DR

This work addresses online contextual optimization under full-information feedback by introducing Bayesian online contextual optimization (BOCO), which unifies general Bayesian updating with PAC-Bayes theory to produce a Gibbs posterior for online decision making. The method uses a pushforward-based aggregated predictor and enforces gradient-free learning via a sequential Monte Carlo scheme with Liu–West rejuvenation, enabling applications to nondifferentiable problems. Theoretical guarantees show an regret bound for bounded and mixable losses, and the practical SMC algorithm provides gradient-free, anytime updates. Empirically, BOCO outperforms gradient-based and deterministic baselines on a nondifferentiable knapsack with uncertain weights, achieving higher and more stable rewards and feasibility, particularly early in the data stream, highlighting the approach’s robustness and practicality for online decision-making under uncertainty.

Abstract

The predict-then-optimize paradigm bridges online learning and contextual optimization in dynamic environments. Previous works have investigated the sequential updating of predictors using feedback from downstream decisions to minimize regret in the full-information settings. However, existing approaches are predominantly frequentist, rely heavily on gradient-based strategies, and employ deterministic predictors that could yield high variance in practice despite their asymptotic guarantees. This work introduces, to the best of our knowledge, the first Bayesian online contextual optimization framework. Grounded in PAC-Bayes theory and general Bayesian updating principles, our framework achieves regret for bounded and mixable losses via a Gibbs posterior, eliminates the dependence on gradients through sequential Monte Carlo samplers, and thereby accommodates nondifferentiable problems. Theoretical developments and numerical experiments substantiate our claims.

Paper Structure

This paper contains 12 sections, 3 theorems, 19 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Theorem 3.2

Suppose Assumption A1 holds. For any distribution $\mu$ over $D$, any $\lambda > 0$, any $\delta \in (0,1)$, and any online posterior $\{\tilde{\pi}\}$ and prior $\{\pi\}$ sequences, the following inequality holds with at least probability $1 - \delta$ over the draw $D \sim \mu$:

Figures (1)

  • Figure 1: Time-averaged cumulative reward and feasibility of four frameworks in 100 trials.

Theorems & Definitions (3)

  • Theorem 3.2: OnlinePACB, Corollary 3.1
  • Theorem 3.3: OnlinePACB, Corollary 3.3
  • Corollary 3.5