Table of Contents
Fetching ...

Online Posterior Sampling with a Diffusion Prior

Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song

Abstract

Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse diffusion process, which are obtained by the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.

Online Posterior Sampling with a Diffusion Prior

Abstract

Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse diffusion process, which are obtained by the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.
Paper Structure (31 sections, 6 theorems, 54 equations, 7 figures, 4 algorithms)

This paper contains 31 sections, 6 theorems, 54 equations, 7 figures, 4 algorithms.

Key Result

Lemma 1

Let $p$ be a probability measure over the reverse process (fig:diffusion model). Then

Figures (7)

  • Figure 1: Graphical models of the forward and reverse processes in the diffusion model. The variable $H$ represents partial information about $S_0$.
  • Figure 2: Evaluation of $\color{Green}\tt DiffTS$ on three synthetic problems. The first row shows samples from the true (blue) and diffusion model (red) priors. The second row shows the regret of $\color{Green}\tt DiffTS$ and the baselines as a function of round $n$.
  • Figure 3: Evaluation of $\color{Green}\tt DiffTS$ on the MovieLens dataset: (a) shows samples from the true (blue) and diffusion model (red) priors, (b) shows regret in the linear bandit, and (c) shows regret in the logistic bandit.
  • Figure 4: Evaluation of $\color{Green}\tt DiffTS$ on another three synthetic problems. The first row shows samples from the true (blue) and diffusion model (red) priors. The second row shows the regret of $\color{Green}\tt DiffTS$ and the baselines as a function of round $n$.
  • Figure 5: Evaluation of $\color{Green}\tt DiffTS$ on the MNIST dataset: (a) shows samples from the true (blue) and diffusion model (red) priors, (b) shows regret in the linear bandit, and (c) shows regret in the logistic bandit.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Lemma 5
  • proof
  • ...and 2 more