Stochastic Optimal Control with Side Information and Bayesian Learning
Johannes Milz, Alexander Shapiro, Enlu Zhou
TL;DR
This work proposes a Bayesian reformulation based on a parametric density model and posterior predictive dynamics, which yields a Bayesian Bellman equation, and proves posterior consistency under Markov samples and uniform convergence of the Bayesian value function.
Abstract
We study infinite-horizon stochastic optimal control problems with observable side information: a Markov chain that modulates an unknown context-conditional randomness distribution. Since this distribution is unknown, we propose a Bayesian reformulation based on a parametric density model and posterior predictive dynamics, which yields a Bayesian Bellman equation. We prove posterior consistency under Markov samples and, under correct specification and identifiability, uniform convergence of the Bayesian value function. Finally, we establish Bernstein--von Mises-type asymptotic normality for the data-driven contextual optimal value.
