Table of Contents
Fetching ...

Integrated Control and Active Perception in POMDPs for Temporal Logic Tasks and Information Acquisition

Chongyang Shi, Michael R. Dorothy, Jie Fu

TL;DR

This work tackles joint control and active perception for partially observable MDPs subject to linear temporal logic over finite traces, introducing an information-theoretic objective that minimizes the uncertainty about a secret temporal event while maximizing task satisfaction within a finite horizon $T$. It formulates a product POMDP by combining the lpomdp with a DFA representing the LTlF task and develops a policy-gradient approach based on observable-operator models to efficiently compute gradients of the conditional entropy $H(Z_T|Y)$ and the success probability $P(W_T=1)$. The key contributions include explicit gradient expressions, a matrix-based probability computation for observation-action sequences, and a gradient-descent algorithm that balances information gathering with task completion, validated on graph-based UAV surveillance scenarios. The results demonstrate that information-gain-focused policies can substantially reduce uncertainty about temporal events while achieving meaningful task performance, highlighting potential for security and surveillance applications under partial observability.

Abstract

This paper studies the synthesis of a joint control and active perception policy for a stochastic system modeled as a partially observable Markov decision process (POMDP), subject to temporal logic specifications. The POMDP actions influence both system dynamics (control) and the emission function (perception). Beyond task completion, the planner seeks to maximize information gain about certain temporal events (the secret) through coordinated perception and control. To enable active information acquisition, we introduce minimizing the Shannon conditional entropy of the secret as a planning objective, alongside maximizing the probability of satisfying the temporal logic formula within a finite horizon. Using a variant of observable operators in hidden Markov models (HMMs) and POMDPs, we establish key properties of the conditional entropy gradient with respect to policy parameters. These properties facilitate efficient policy gradient computation. We validate our approach through graph-based examples, inspired by common security applications with UAV surveillance.

Integrated Control and Active Perception in POMDPs for Temporal Logic Tasks and Information Acquisition

TL;DR

This work tackles joint control and active perception for partially observable MDPs subject to linear temporal logic over finite traces, introducing an information-theoretic objective that minimizes the uncertainty about a secret temporal event while maximizing task satisfaction within a finite horizon . It formulates a product POMDP by combining the lpomdp with a DFA representing the LTlF task and develops a policy-gradient approach based on observable-operator models to efficiently compute gradients of the conditional entropy and the success probability . The key contributions include explicit gradient expressions, a matrix-based probability computation for observation-action sequences, and a gradient-descent algorithm that balances information gathering with task completion, validated on graph-based UAV surveillance scenarios. The results demonstrate that information-gain-focused policies can substantially reduce uncertainty about temporal events while achieving meaningful task performance, highlighting potential for security and surveillance applications under partial observability.

Abstract

This paper studies the synthesis of a joint control and active perception policy for a stochastic system modeled as a partially observable Markov decision process (POMDP), subject to temporal logic specifications. The POMDP actions influence both system dynamics (control) and the emission function (perception). Beyond task completion, the planner seeks to maximize information gain about certain temporal events (the secret) through coordinated perception and control. To enable active information acquisition, we introduce minimizing the Shannon conditional entropy of the secret as a planning objective, alongside maximizing the probability of satisfying the temporal logic formula within a finite horizon. Using a variant of observable operators in hidden Markov models (HMMs) and POMDPs, we establish key properties of the conditional entropy gradient with respect to policy parameters. These properties facilitate efficient policy gradient computation. We validate our approach through graph-based examples, inspired by common security applications with UAV surveillance.

Paper Structure

This paper contains 6 sections, 6 theorems, 38 equations, 4 figures.

Key Result

Lemma 1

The gradient of the conditional entropy w.r.t. the policy parameter $\theta$ is

Figures (4)

  • Figure 1: The graph represents the transition of the pomdp. The arrows labeled with $a$, $b$, or $c$ represent deterministic actions, while the arrows labeled with both actions and probabilities indicate stochastic actions that lead to a particular state with a given probability.
  • Figure 2: The dfa for the task. Self-loops with label $\varnothing$ are omitted. Double circle nodes represent accepting states.
  • Figure 3: The convergence results of the policy gradient method when goal states of different types are overlapping.
  • Figure 4: The convergence results of the policy gradient method when goal states of different types are non-overlapping.

Theorems & Definitions (16)

  • Definition 1: lpomdp
  • Definition 2: ltlf Syntax de2013linear
  • Definition 3: dfa
  • Definition 4: Product POMDP
  • Definition 5
  • Remark 1
  • Lemma 1
  • proof
  • Definition 6
  • Proposition 1
  • ...and 6 more