Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition
Taylor Bergeron, Zachary Serlin, Kevin Leahy
TL;DR
Comp-LTL tackles the challenge of satisfying Linear Temporal Logic specifications in unknown environments by learning a set of primitive RL policies and composing them zero-shot at run time. It constructs a policy-aware transition system from the environment and base policies, prunes nondeterministic and infeasible transitions to ensure determinism, and then forms a Cartesian product with a Büchi automaton derived from the LTL specification to plan a satisfying policy. Theoretical analysis confirms determinism and soundness of the pruning and product construction, while simulations demonstrate safe, zero-shot satisfaction and competitive performance relative to baselines. The approach offers improved adaptability and safety in dynamic tasks, albeit at the cost of potentially non-shortest paths due to automaton-focused planning.
Abstract
This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to satisfy spatial and temporal goals that are unknown until run time. Prior work focuses on learning policies for executing a task specified using LTL, but they incorporate the specification into the learning process. Any change to the specification requires retraining the policy, either via fine-tuning or from scratch. We present a more flexible approach -- to learn a set of composable task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Task primitives can be learned offline using RL and combined using Boolean composition at deployment. This work focuses on creating and pruning a transition system (TS) representation of the environment in order to solve for deterministic, non-ambiguous, and feasible solutions to LTL specifications given an environment and a set of task primitive policies. We show that our pruned TS is deterministic, contains no unrealizable transitions, and is sound. We verify our approach via simulation and compare it to other state of the art approaches, showing that Comp-LTL is safer and more adaptable.
