Online Posterior Sampling with a Diffusion Prior

Branislav Kveton; Boris Oreshkin; Youngsuk Park; Aniket Deshmukh; Rui Song

Online Posterior Sampling with a Diffusion Prior

Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song

Abstract

Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse diffusion process, which are obtained by the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.

Online Posterior Sampling with a Diffusion Prior

Abstract

Paper Structure (31 sections, 6 theorems, 54 equations, 7 figures, 4 algorithms)

This paper contains 31 sections, 6 theorems, 54 equations, 7 figures, 4 algorithms.

Introduction
Setting
Linear Model
Generalized Linear Model
Towards Diffusion Model Priors
Diffusion Models
Posterior Sampling
Chain Model Posterior
Linear Model Posterior
Key Approximation in \ref{['thm:linear posterior']}
GLM Posterior
Application to Contextual Bandits
Experiments
Experimental Setup
Synthetic Experiment
...and 16 more sections

Key Result

Lemma 1

Let $p$ be a probability measure over the reverse process (fig:diffusion model). Then

Figures (7)

Figure 1: Graphical models of the forward and reverse processes in the diffusion model. The variable $H$ represents partial information about $S_0$.
Figure 2: Evaluation of $\color{Green}\tt DiffTS$ on three synthetic problems. The first row shows samples from the true (blue) and diffusion model (red) priors. The second row shows the regret of $\color{Green}\tt DiffTS$ and the baselines as a function of round $n$.
Figure 3: Evaluation of $\color{Green}\tt DiffTS$ on the MovieLens dataset: (a) shows samples from the true (blue) and diffusion model (red) priors, (b) shows regret in the linear bandit, and (c) shows regret in the logistic bandit.
Figure 4: Evaluation of $\color{Green}\tt DiffTS$ on another three synthetic problems. The first row shows samples from the true (blue) and diffusion model (red) priors. The second row shows the regret of $\color{Green}\tt DiffTS$ and the baselines as a function of round $n$.
Figure 5: Evaluation of $\color{Green}\tt DiffTS$ on the MNIST dataset: (a) shows samples from the true (blue) and diffusion model (red) priors, (b) shows regret in the linear bandit, and (c) shows regret in the logistic bandit.
...and 2 more figures

Theorems & Definitions (12)

Lemma 1
proof
Theorem 2
proof
Theorem 3
proof
Theorem 4
proof
Lemma 5
proof
...and 2 more

Online Posterior Sampling with a Diffusion Prior

Abstract

Online Posterior Sampling with a Diffusion Prior

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)