DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL
Mathias Jackermeier, Alessandro Abate
TL;DR
DeepLTL addresses the challenge of zero-shot satisfaction of arbitrary LTL specifications in multi-task RL by exploiting the structure of limit-deterministic Büchi automata to reason about satisfaction paths. It represents LTL formulae as reach-avoid sequences and learns a sequence-conditioned policy using a DeepSets+RNN architecture that conditions actions on possible satisfaction paths. At test time, it selects the optimal reach-avoid sequence from the current automaton state to guide planning and action, enabling handling of infinite-horizon specifications and safety constraints. Empirical results across discrete and continuous domains show superior satisfaction probability and efficiency compared with state-of-the-art baselines, highlighting the practical impact of separating high-level temporal reasoning from low-level control for general LTL-conditioned RL.
Abstract
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in multi-task reinforcement learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not observed during training remains a challenging problem. Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of Büchi automata, which explicitly represent the semantics of LTL specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae. Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency. Code available at: https://deep-ltl.github.io/
