Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

Taylor Bergeron; Zachary Serlin; Kevin Leahy

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

Taylor Bergeron, Zachary Serlin, Kevin Leahy

TL;DR

Comp-LTL tackles the challenge of satisfying Linear Temporal Logic specifications in unknown environments by learning a set of primitive RL policies and composing them zero-shot at run time. It constructs a policy-aware transition system from the environment and base policies, prunes nondeterministic and infeasible transitions to ensure determinism, and then forms a Cartesian product with a Büchi automaton derived from the LTL specification to plan a satisfying policy. Theoretical analysis confirms determinism and soundness of the pruning and product construction, while simulations demonstrate safe, zero-shot satisfaction and competitive performance relative to baselines. The approach offers improved adaptability and safety in dynamic tasks, albeit at the cost of potentially non-shortest paths due to automaton-focused planning.

Abstract

This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to satisfy spatial and temporal goals that are unknown until run time. Prior work focuses on learning policies for executing a task specified using LTL, but they incorporate the specification into the learning process. Any change to the specification requires retraining the policy, either via fine-tuning or from scratch. We present a more flexible approach -- to learn a set of composable task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Task primitives can be learned offline using RL and combined using Boolean composition at deployment. This work focuses on creating and pruning a transition system (TS) representation of the environment in order to solve for deterministic, non-ambiguous, and feasible solutions to LTL specifications given an environment and a set of task primitive policies. We show that our pruned TS is deterministic, contains no unrealizable transitions, and is sound. We verify our approach via simulation and compare it to other state of the art approaches, showing that Comp-LTL is safer and more adaptable.

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

TL;DR

Abstract

Paper Structure (16 sections, 3 theorems, 1 equation, 10 figures, 3 tables, 3 algorithms)

This paper contains 16 sections, 3 theorems, 1 equation, 10 figures, 3 tables, 3 algorithms.

Introduction
Background and Problem Formulation
Technical Approach
Generating The Transition System
Transition System Pruning
Case 1: Equivalent States
Case 2: Ambiguous Transitions
Case 3: Ineffectual Transitions and Feasibility
Empty Cleanup
Product between Transition System and Büchi Automaton
Theoretical Analysis
Results
Simulation
Case Study
Comparison
...and 1 more sections

Key Result

Theorem 1

The resulting pruned TS from Sec. sec:prune_ts is deterministic.

Figures (10)

Figure 1: Comp-LTL training and path execution pipeline. Policies are trained for a set of base tasks. The environment is abstracted as a policy-aware transition system, which can then be composed with a Büchi automaton to plan over the Boolean composition of task policies.
Figure 2: Block diagram representation of the overall Zero-Shot Algorithm for Comp-LTL. It includes the TS generation algorithm described in Section \ref{['sec:generate_ts']}, from parsing the map, generating the initial TS, and pruning the TS to remove equivalency, solve ambiguity, and ensure feasibility. All sub algorithms are denoted in bold.
Figure 3: Example of an environment with regions to be parsed by Algorithm \ref{['alg:generate_ts']} (\ref{['fig:env']}). A corresponding TS (\ref{['fig:init_ts']}) cannot be used directly, since it is not known a priori which region with a given label will be reached when executing a policy trained with RL.
Figure 4: Unpruned TS created by Alg. \ref{['alg:generate_ts']} on the environment in Fig. \ref{['fig:env']}. State labels appear above each state. Transition labels correspond to task policies that enable a transition.
Figure 5: State labels appear above each state. Transition labels correspond to task policies that enable a transition. Red states and transitions are removed via case1 to create Figure \ref{['case2']}. Symbols a, b, c, {} are deleted to remove equivalency in the system.
...and 5 more figures

Theorems & Definitions (7)

Definition 1
Definition 2
Remark 1
Theorem 1
Theorem 2
Theorem 3
Remark 2

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

TL;DR

Abstract

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)