Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

Krishna C. Kalagarla; Matthew Low; Rahul Jain; Ashutosh Nayyar; Pierluigi Nuzzo

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

Krishna C. Kalagarla, Matthew Low, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

TL;DR

This work uses the framework of Constrained Markov Decision Processes to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting.

Abstract

Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of Constrained Markov Decision Processes (CMDPs) to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting. The returned policies are guaranteed to satisfy the constraints with high probability and provide a lower bound on the achieved objective reward. We empirically find the returned policies to achieve near-optimal rewards while enjoying an order of magnitude reduction in problem size and execution time.

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

TL;DR

Abstract

Paper Structure (19 sections, 3 theorems, 29 equations, 1 figure, 1 table)

This paper contains 19 sections, 3 theorems, 29 equations, 1 figure, 1 table.

Introduction
Preliminaries
Labeled Finite-Horizon MDPs
Occupancy Measures
Finite Linear Temporal Logic Specification
Deterministic Finite Automaton (DFA)
Problem Formulation
Solution Approach
Framing 2-Player MDP as Joint MDP
Solution Procedure for a Single Player MDP
Linear Programming Formulation
Assume-Guarantee Transformation
Formalization of Assume-Guarantee Decomposition
Policy Synthesis as a Linear Program
Linear Program Size Comparison
...and 4 more sections

Key Result

Theorem 1

For any policy $\pi$, we have Therefore, a policy $\pi^*$ is an optimal solution in Problem sprobform if and only if it is an optimal solution to Problem prodprobform.

Figures (1)

Figure 1: Gridworlds used in the $4\times4$ experiments. Goal (reach) and obstacle (avoid) states corresponding to the $\textsc{LTL}_f$ specification are denoted by stars and triangles, respectively. Locations with fewer stars should be visited first. The starting positions of the agents are marked by the stick figures. Positions corresponding to ${\mathscr M}^1$ are marked in green, while those for ${\mathscr M}^2$ are shown in light blue.

Theorems & Definitions (4)

Theorem 1: Equivalence of Problems \ref{['sprobform']} and \ref{['prodprobform']}
Theorem 2: Soundness of AG Policy Composition
proof
Theorem 3: Lower Bound on the Achieved Objective Reward

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

TL;DR

Abstract

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (4)