Integrated Control and Active Perception in POMDPs for Temporal Logic Tasks and Information Acquisition
Chongyang Shi, Michael R. Dorothy, Jie Fu
TL;DR
This work tackles joint control and active perception for partially observable MDPs subject to linear temporal logic over finite traces, introducing an information-theoretic objective that minimizes the uncertainty about a secret temporal event while maximizing task satisfaction within a finite horizon $T$. It formulates a product POMDP by combining the lpomdp with a DFA representing the LTlF task and develops a policy-gradient approach based on observable-operator models to efficiently compute gradients of the conditional entropy $H(Z_T|Y)$ and the success probability $P(W_T=1)$. The key contributions include explicit gradient expressions, a matrix-based probability computation for observation-action sequences, and a gradient-descent algorithm that balances information gathering with task completion, validated on graph-based UAV surveillance scenarios. The results demonstrate that information-gain-focused policies can substantially reduce uncertainty about temporal events while achieving meaningful task performance, highlighting potential for security and surveillance applications under partial observability.
Abstract
This paper studies the synthesis of a joint control and active perception policy for a stochastic system modeled as a partially observable Markov decision process (POMDP), subject to temporal logic specifications. The POMDP actions influence both system dynamics (control) and the emission function (perception). Beyond task completion, the planner seeks to maximize information gain about certain temporal events (the secret) through coordinated perception and control. To enable active information acquisition, we introduce minimizing the Shannon conditional entropy of the secret as a planning objective, alongside maximizing the probability of satisfying the temporal logic formula within a finite horizon. Using a variant of observable operators in hidden Markov models (HMMs) and POMDPs, we establish key properties of the conditional entropy gradient with respect to policy parameters. These properties facilitate efficient policy gradient computation. We validate our approach through graph-based examples, inspired by common security applications with UAV surveillance.
