Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes
Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo
TL;DR
The paper tackles the challenge of designing policies for partially observable systems that must satisfy complex temporal logic constraints while maximizing cumulative reward. It introduces a constrained product POMDP formalism by augmenting the environment with a DFA that encodes the $\textsc{LTL}_f$ specification, and solves the resulting problem via a no-regret primal-dual approach using Exponentiated Gradient, with theoretical bounds on near-optimality and feasibility. The framework is extended to multi-agent settings under information asymmetry by leveraging the common information approach, enabling tractable solutions and preserving performance guarantees. Experiments across single- and multi-agent gridworld scenarios demonstrate effective trade-offs between exploration, reward, and specification satisfaction, validating the method’s practical relevance for safety-critical autonomous systems.
Abstract
Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.
