Distributionally Robust Deep Q-Learning
Chung I Lu, Julian Sester, Aijia Zhang
TL;DR
The paper addresses learning robust policies for continuous-state MDPs under transition-model uncertainty. It introduces a distributionally robust Q-learning framework that uses Sinkhorn distance-based ambiguity sets around a reference measure, and solves the robust Bellman equation via dualisation. By parameterising the robust Q-function with neural networks, the authors derive Robust DQN (RDQN), a practical algorithm that modifies the target computation and training objective to optimize for worst-case state transitions. Theoretical results guarantee the dynamic programming principle and the existence of solutions under compact state spaces, and empirical studies on a toy gambling task and S&P 500 portfolio optimization illustrate improved tail performance and risk-adjusted returns under distributional shifts. The work advances robust reinforcement learning by providing a tractable, scalable approach that explicitly accounts for model misspecification in continuous-state settings, with clear implications for finance and other risk-sensitive domains.
Abstract
We propose a novel distributionally robust $Q$-learning algorithm for the non-tabular case accounting for continuous state spaces where the state transition of the underlying Markov decision process is subject to model uncertainty. The uncertainty is taken into account by considering the worst-case transition from a ball around a reference probability measure. To determine the optimal policy under the worst-case state transition, we solve the associated non-linear Bellman equation by dualising and regularising the Bellman operator with the Sinkhorn distance, which is then parameterized with deep neural networks. This approach allows us to modify the Deep Q-Network algorithm to optimise for the worst case state transition. We illustrate the tractability and effectiveness of our approach through several applications, including a portfolio optimisation task based on S\&{P}~500 data.
