Table of Contents
Fetching ...

Entropy Rate Maximization of Markov Decision Processes under Linear Temporal Logic Tasks

Yu Chen, Shaoyuan Li, Xiang Yin

TL;DR

This work addresses entropy-rate maximization for Markov decision processes under Linear Temporal Logic tasks, seeking policies that satisfy the task with probability one while maximizing long-run unpredictability. It introduces a two-stage framework: first solving the entropy-rate maximization on communicating MDPs via a convex (actually linearizable) program yielding a stationary, irreducible Markov chain, then extending to general MDPs through a state-level MEC/L MEC hierarchy that enables a polynomial-time, backward-decomposition synthesis. The authors provide proofs of soundness and completeness and demonstrate the approach on two robot task-planning case studies, highlighting how the resulting policies increase unpredictability without sacrificing task satisfaction. The methodology advances entropy-rate optimization under high-level temporal logic constraints and offers practical tools for secure, uncertain environments where adversaries might exploit deterministic behavior.

Abstract

We investigate the problem of synthesizing optimal control policies for Markov decision processes (MDPs) with both qualitative and quantitative objectives. Specifically, our goal is to achieve a given linear temporal logic (LTL) task with probability one, while maximizing the \emph{entropy rate} of the system. The notion of entropy rate characterizes the long-run average (un)predictability of a stochastic process. Such an optimal policy is of our interest, in particular, from the security point of view, as it not only ensures the completion of tasks, but also maximizes the unpredictability of the system. However, existing works only focus on maximizing the total entropy which may diverge to infinity for infinite horizon. In this paper, we provide a complete solution to the entropy rate maximization problem under LTL constraints. Specifically, we first present an algorithm for synthesizing entropy rate maximizing policies for communicating MDPs. Then based on a new state classification method, we show the entropy rate maximization problem under LTL task can be effectively solved in polynomial-time. We illustrate the proposed algorithm based on two case studies of robot task planning scenario.

Entropy Rate Maximization of Markov Decision Processes under Linear Temporal Logic Tasks

TL;DR

This work addresses entropy-rate maximization for Markov decision processes under Linear Temporal Logic tasks, seeking policies that satisfy the task with probability one while maximizing long-run unpredictability. It introduces a two-stage framework: first solving the entropy-rate maximization on communicating MDPs via a convex (actually linearizable) program yielding a stationary, irreducible Markov chain, then extending to general MDPs through a state-level MEC/L MEC hierarchy that enables a polynomial-time, backward-decomposition synthesis. The authors provide proofs of soundness and completeness and demonstrate the approach on two robot task-planning case studies, highlighting how the resulting policies increase unpredictability without sacrificing task satisfaction. The methodology advances entropy-rate optimization under high-level temporal logic constraints and offers practical tools for secure, uncertain environments where adversaries might exploit deterministic behavior.

Abstract

We investigate the problem of synthesizing optimal control policies for Markov decision processes (MDPs) with both qualitative and quantitative objectives. Specifically, our goal is to achieve a given linear temporal logic (LTL) task with probability one, while maximizing the \emph{entropy rate} of the system. The notion of entropy rate characterizes the long-run average (un)predictability of a stochastic process. Such an optimal policy is of our interest, in particular, from the security point of view, as it not only ensures the completion of tasks, but also maximizes the unpredictability of the system. However, existing works only focus on maximizing the total entropy which may diverge to infinity for infinite horizon. In this paper, we provide a complete solution to the entropy rate maximization problem under LTL constraints. Specifically, we first present an algorithm for synthesizing entropy rate maximizing policies for communicating MDPs. Then based on a new state classification method, we show the entropy rate maximization problem under LTL task can be effectively solved in polynomial-time. We illustrate the proposed algorithm based on two case studies of robot task planning scenario.
Paper Structure (23 sections, 10 theorems, 80 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 10 theorems, 80 equations, 6 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

Let $\mathcal{M}=(S,s_0,A,P,\mathcal{AP},\ell,Acc)$ be a product MDP. Then if $\Pi^{\varphi}_{\mathcal{M}} \neq \emptyset$, there exists $\mu^\star \in \Pi^{\varphi}_{\mathcal{M}}\cap \Pi^{S}_{\mathcal{M}}$ such that

Figures (6)

  • Figure 1: Illustrative example of state level classification. The state $0$ is the initial state. All states except for state $2$ have only one available action and thus we omit the action notation. The transition probability is illustrated on the edge. We also omit some transition probabilities among states $4,6$ and $7$ which are all equal to $\frac{1}{3}$. The state $2$ has two available actions. One action will reach state $1$ with probability $1$. The other action will reach states $5$ and $3$ with probability $0.5$ respectively.
  • Figure 2: Illustrative example of state level classification.
  • Figure 3: Workspace of the robot.
  • Figure 4: Limit distribution (multiplied by $100$) of the optimal policy $\mathcal{M}^{\mu}$.
  • Figure 5: Largest difference of the probabilities of picking two different actions at each state.
  • ...and 1 more figures

Theorems & Definitions (41)

  • Definition 1: Maximal End Components
  • Definition 2: Entropy Rate thomas2006elements
  • Definition 3: Deterministic Rabin Automata
  • Definition 4: Product MDPs
  • Definition 5: Accepting Maximal End Components
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Remark 1: Complexity of the Nonlinear Program
  • ...and 31 more