Table of Contents
Fetching ...

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

Krishna C. Kalagarla, Matthew Low, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

TL;DR

This work uses the framework of Constrained Markov Decision Processes to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting.

Abstract

Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of Constrained Markov Decision Processes (CMDPs) to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting. The returned policies are guaranteed to satisfy the constraints with high probability and provide a lower bound on the achieved objective reward. We empirically find the returned policies to achieve near-optimal rewards while enjoying an order of magnitude reduction in problem size and execution time.

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

TL;DR

This work uses the framework of Constrained Markov Decision Processes to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting.

Abstract

Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of Constrained Markov Decision Processes (CMDPs) to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting. The returned policies are guaranteed to satisfy the constraints with high probability and provide a lower bound on the achieved objective reward. We empirically find the returned policies to achieve near-optimal rewards while enjoying an order of magnitude reduction in problem size and execution time.
Paper Structure (19 sections, 3 theorems, 29 equations, 1 figure, 1 table)

This paper contains 19 sections, 3 theorems, 29 equations, 1 figure, 1 table.

Key Result

Theorem 1

For any policy $\pi$, we have Therefore, a policy $\pi^*$ is an optimal solution in Problem sprobform if and only if it is an optimal solution to Problem prodprobform.

Figures (1)

  • Figure 1: Gridworlds used in the $4\times4$ experiments. Goal (reach) and obstacle (avoid) states corresponding to the $\textsc{LTL}_f$ specification are denoted by stars and triangles, respectively. Locations with fewer stars should be visited first. The starting positions of the agents are marked by the stick figures. Positions corresponding to ${\mathscr M}^1$ are marked in green, while those for ${\mathscr M}^2$ are shown in light blue.

Theorems & Definitions (4)

  • Theorem 1: Equivalence of Problems \ref{['sprobform']} and \ref{['prodprobform']}
  • Theorem 2: Soundness of AG Policy Composition
  • proof
  • Theorem 3: Lower Bound on the Achieved Objective Reward