Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning
Daniel Koutas, Daniel Hettegger, Kostas G. Papakonstantinou, Daniel Straub
TL;DR
This paper tackles belief-space DRL for POMDPs by exploiting the convexity of the optimal value function over beliefs, proposing hard- and soft-enforced convexity within a Dueling Q-network framework. The authors demonstrate that incorporating convexity constraints can improve learning speed, robustness to hyperparameters, and extrapolation to out-of-distribution observations, with gradient-based soft enforcement often performing best. Experiments on Tiger and FVRS show notable gains in OOD settings and robust performance across problem variants, suggesting that well-behaved value-function extrapolation is beneficial in partially observable domains. The work provides a practical approach to enhance DRL in belief-based settings and points to future directions in high-dimensional belief spaces and actor-critic architectures.
Abstract
We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.
