Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning
David Hudák, Maris F. L. Galesloot, Martin Tappler, Martin Kurečka, Nils Jansen, Milan Češka
TL;DR
The paper tackles scalability and verifiability in POMDP planning by combining deep reinforcement learning with formal policy extraction. It introduces Lexpop, which trains neural policies via DRL and then extracts finite-state controllers (FSCs) that can be formally evaluated, extending the approach to HM-POMDPs for worst-case robustness. Two FSC extraction methods are proposed: automata learning (Alergia) and Self-Interpretable Networks (SIG), together with a verification framework (PAYNT/Storm) to guarantee performance bounds. Experiments on large state-space problems show that Lexpop outperforms state-of-the-art solvers in both standard and robust HM-POMDP settings, demonstrating scalable, verifiable planning in challenging domains.
Abstract
Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.
