Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

Yingru Li; Zhi-Quan Luo

Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

Yingru Li, Zhi-Quan Luo

TL;DR

The first prior-dependent Bayesian regret bound for RL with function approximation is established; the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL) is refined; and an upper bound of $\mathcal{O}(\sqrt{\log T})$ is presented.

Abstract

This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL), presenting an upper bound of ${\mathcal{O}}(d\sqrt{H^3 T \log T})$, where $d$ represents the dimensionality of the transition kernel, $H$ the planning horizon, and $T$ the total number of interactions. This signifies a methodological enhancement by optimizing the $\mathcal{O}(\sqrt{\log T})$ factor over the previous benchmark (Osband and Van Roy, 2014) specified to linear mixture MDPs. Our approach, leveraging a value-targeted model learning perspective, introduces a decoupling argument and a variance reduction technique, moving beyond traditional analyses reliant on confidence sets and concentration inequalities to formalize Bayesian regret bounds more effectively.

Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

TL;DR

is presented.

Abstract

, where

represents the dimensionality of the transition kernel,

the planning horizon, and

the total number of interactions. This signifies a methodological enhancement by optimizing the

factor over the previous benchmark (Osband and Van Roy, 2014) specified to linear mixture MDPs. Our approach, leveraging a value-targeted model learning perspective, introduces a decoupling argument and a variance reduction technique, moving beyond traditional analyses reliant on confidence sets and concentration inequalities to formalize Bayesian regret bounds more effectively.

Paper Structure (51 sections, 22 theorems, 124 equations, 3 algorithms)

This paper contains 51 sections, 22 theorems, 124 equations, 3 algorithms.

Introduction
Motivation.
Important Question.
Key Contributions
Preview of Technical Novelty
Related Works
Linear function approximation.
Randomized exploration.
Bayesian regret analysis.
Preliminaries
Finite horizon MDP.
Observations and environmental randomness.
Algorithm and algorithmic randomness.
Bayesian RL and regret.
Linear mixture MDPs.
...and 36 more sections

Key Result

Theorem 1

For any prior over models $\Theta^* = (\theta^*_0, \ldots, \theta^*_{H-1})$ satisfying asmp:mutual-independence, PSRL have the Bayesian regret bound $\mathfrak{B}\Re(\operatorname{prior}, \operatorname{PSRL}, L)$ over $L$ episodes interaction with the time-inhomogeneous linear mixture MDP satisfying where ${\boldsymbol{\Gamma}}_{1, h}$ is the covariance of $\theta_h^*$ under $\operatorname{prior}$

Theorems & Definitions (48)

Definition 1: Value-correlated feature
Definition 2: Covariance matrix of unknown model parameters under posterior distribution
Theorem 1: Prior-dependent analysis
Remark 1: Prior-free bound
Lemma 1
Lemma 2: Estimation decomposition conditioned on history
Lemma 3
Definition 3
Definition 4
Theorem 2: Posterior variance reduction
...and 38 more

Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

TL;DR

Abstract

Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (48)