Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control
Jose Blanchet, Jiayi Cheng, Yuewei Ling, Hao Liu, Yang Liu
TL;DR
This work addresses diffusion-control under parameter misspecification by introducing distributionally robust Bayesian control (DRBC), which confines robustness to a KL-based perturbation of the Bayesian prior. A strong duality result recasts the inner robust prior evaluation into a low-dimensional optimization, enabling a simulation-based policy evaluation and learning framework with structured policy parameterizations. The authors establish an $O_p(n^{-1/2})$ convergence rate for the randomized multi-level Monte Carlo estimator and demonstrate DRBC's effectiveness through synthetic linear-quadratic and Bayesian Merton examples, complemented by real-data SP500 experiments that show improved out-of-sample performance and reduced pessimism. The work provides scalable offline policy evaluation tools for robust Bayesian diffusion control and suggests promising future directions toward more general ambiguity sets and high-dimensional settings.
Abstract
We study diffusion control problems under parameter uncertainty. Controllers based on plug-in estimation can be brittle due to potential distribution shifts. Bayesian control with a prior on the parameters offers a formulation with beliefs about such shifts. However, as with any Bayesian model, the prior may be misspecified. To mitigate misspecification and reduce over-pessimism compared to classical robust control approaches (e.g. \citet{hansen2008robustness}), we propose a distributionally robust Bayesian control (DRBC) formulation in which an adversary perturbs the prior within a divergence neighborhood of a baseline prior. We develop a strong duality result that reduces the distributionally robust prior evaluation to a low-dimensional optimization and yields a practical simulation-based policy evaluation and learning procedure with structured policy parameterizations. We validate the efficiency of the algorithm on a synthetic linear-quadratic control example and real-data portfolio selection.
