Table of Contents
Fetching ...

Good-for-MDP State Reduction for Stochastic LTL Planning

Christoph Weinhuber, Giuseppe De Giacomo, Yong Li, Sven Schewe, Qiyi Tang

TL;DR

The paper addresses scalable stochastic planning for MDPs with LTL goals by leveraging good-for-MDP (GFM) automata and a novel state-space reduction pipeline. It shows how to convert GFM automata into 0/1-PA via a sequence of polynomial-time transformations, enabling efficient policy synthesis while preserving optimality. A key contribution is a direct, specialized construction for the $G F \varphi$ fragment (with $\varphi$ co-safety) that yields only $2^{O(|\varphi|)}$ states, a singly exponential blow-up compared to the doubly exponential growth of general approaches. Empirical results demonstrate substantial automata-size reductions and runtime improvements across diverse benchmarks, highlighting the practical impact for planning, verification, and reinforcement learning tasks involving LTL objectives.

Abstract

We study stochastic planning problems in Markov Decision Processes (MDPs) with goals specified in Linear Temporal Logic (LTL). The state-of-the-art approach transforms LTL formulas into good-for-MDP (GFM) automata, which feature a restricted form of nondeterminism. These automata are then composed with the MDP, allowing the agent to resolve the nondeterminism during policy synthesis. A major factor affecting the scalability of this approach is the size of the generated automata. In this paper, we propose a novel GFM state-space reduction technique that significantly reduces the number of automata states. Our method employs a sophisticated chain of transformations, leveraging recent advances in good-for-games minimisation developed for adversarial settings. In addition to our theoretical contributions, we present empirical results demonstrating the practical effectiveness of our state-reduction technique. Furthermore, we introduce a direct construction method for formulas of the form $\mathsf{G}\mathsf{F}\varphi$, where $\varphi$ is a co-safety formula. This construction is provably single-exponential in the worst case, in contrast to the general doubly-exponential complexity. Our experiments confirm the scalability advantages of this specialised construction.

Good-for-MDP State Reduction for Stochastic LTL Planning

TL;DR

The paper addresses scalable stochastic planning for MDPs with LTL goals by leveraging good-for-MDP (GFM) automata and a novel state-space reduction pipeline. It shows how to convert GFM automata into 0/1-PA via a sequence of polynomial-time transformations, enabling efficient policy synthesis while preserving optimality. A key contribution is a direct, specialized construction for the fragment (with co-safety) that yields only states, a singly exponential blow-up compared to the doubly exponential growth of general approaches. Empirical results demonstrate substantial automata-size reductions and runtime improvements across diverse benchmarks, highlighting the practical impact for planning, verification, and reinforcement learning tasks involving LTL objectives.

Abstract

We study stochastic planning problems in Markov Decision Processes (MDPs) with goals specified in Linear Temporal Logic (LTL). The state-of-the-art approach transforms LTL formulas into good-for-MDP (GFM) automata, which feature a restricted form of nondeterminism. These automata are then composed with the MDP, allowing the agent to resolve the nondeterminism during policy synthesis. A major factor affecting the scalability of this approach is the size of the generated automata. In this paper, we propose a novel GFM state-space reduction technique that significantly reduces the number of automata states. Our method employs a sophisticated chain of transformations, leveraging recent advances in good-for-games minimisation developed for adversarial settings. In addition to our theoretical contributions, we present empirical results demonstrating the practical effectiveness of our state-reduction technique. Furthermore, we introduce a direct construction method for formulas of the form , where is a co-safety formula. This construction is provably single-exponential in the worst case, in contrast to the general doubly-exponential complexity. Our experiments confirm the scalability advantages of this specialised construction.

Paper Structure

This paper contains 45 sections, 8 theorems, 6 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Once an EC $\mathcal{E}$ of an MDP is entered, there is a strategy that visits every state-action combination in $\mathcal{E}$ with probability $1$ and stays in $\mathcal{E}$ forever. Moreover, for every strategy the union of the end-components is visited with probability $1$. An infinite path of an

Figures (3)

  • Figure 1: Overview of our state reduction pipeline
  • Figure 2: NBA example not good-for-MDP
  • Figure 3: NBA example good-for-MDP

Theorems & Definitions (12)

  • Theorem 1: DBLP:phd/us/Alfaro97DBLP:books/daglib/0020348
  • Theorem 2
  • Lemma 1
  • Theorem 3
  • Lemma 2
  • Theorem 4
  • Lemma 3
  • Theorem 5
  • proof
  • proof
  • ...and 2 more