Table of Contents
Fetching ...

One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning

Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li

TL;DR

GenZ-LTL tackles zero-shot generalization of reinforcement learning to arbitrary Linear Temporal Logic specifications by decomposing LTL formulas into reach-avoid subgoals using Büchi automata. It trains subgoal-conditioned policies under a safe RL objective that enforces state-wise Hamilton-Jacobi reachability constraints, and introduces a subgoal-induced observation reduction to curb exponential observation-subgoal combinations. At test time, the method selects the optimal subgoal by trading off reachability progress and safety, and gracefully handles unsatisfiable subgoals with a timeout-based switch. Empirical results on LetterWorld and ZoneEnv show GenZ-LTL substantially outperforms state-of-the-art baselines in zero-shot generalization to unseen specifications, including infinite-horizon tasks and scenarios with increased complexity and safety requirements. The work also demonstrates the practical value of combining LTL-based task decomposition with safe RL and observation reduction for scalable, expressive goal specifications.

Abstract

Generalizing to complex and temporally extended task objectives and safety constraints remains a critical challenge in reinforcement learning (RL). Linear temporal logic (LTL) offers a unified formalism to specify such requirements, yet existing methods are limited in their abilities to handle nested long-horizon tasks and safety constraints, and cannot identify situations when a subgoal is not satisfiable and an alternative should be sought. In this paper, we introduce GenZ-LTL, a method that enables zero-shot generalization to arbitrary LTL specifications. GenZ-LTL leverages the structure of Büchi automata to decompose an LTL task specification into sequences of reach-avoid subgoals. Contrary to the current state-of-the-art method that conditions on subgoal sequences, we show that it is more effective to achieve zero-shot generalization by solving these reach-avoid problems \textit{one subgoal at a time} through proper safe RL formulations. In addition, we introduce a novel subgoal-induced observation reduction technique that can mitigate the exponential complexity of subgoal-state combinations under realistic assumptions. Empirical results show that GenZ-LTL substantially outperforms existing methods in zero-shot generalization to unseen LTL specifications.

One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning

TL;DR

GenZ-LTL tackles zero-shot generalization of reinforcement learning to arbitrary Linear Temporal Logic specifications by decomposing LTL formulas into reach-avoid subgoals using Büchi automata. It trains subgoal-conditioned policies under a safe RL objective that enforces state-wise Hamilton-Jacobi reachability constraints, and introduces a subgoal-induced observation reduction to curb exponential observation-subgoal combinations. At test time, the method selects the optimal subgoal by trading off reachability progress and safety, and gracefully handles unsatisfiable subgoals with a timeout-based switch. Empirical results on LetterWorld and ZoneEnv show GenZ-LTL substantially outperforms state-of-the-art baselines in zero-shot generalization to unseen specifications, including infinite-horizon tasks and scenarios with increased complexity and safety requirements. The work also demonstrates the practical value of combining LTL-based task decomposition with safe RL and observation reduction for scalable, expressive goal specifications.

Abstract

Generalizing to complex and temporally extended task objectives and safety constraints remains a critical challenge in reinforcement learning (RL). Linear temporal logic (LTL) offers a unified formalism to specify such requirements, yet existing methods are limited in their abilities to handle nested long-horizon tasks and safety constraints, and cannot identify situations when a subgoal is not satisfiable and an alternative should be sought. In this paper, we introduce GenZ-LTL, a method that enables zero-shot generalization to arbitrary LTL specifications. GenZ-LTL leverages the structure of Büchi automata to decompose an LTL task specification into sequences of reach-avoid subgoals. Contrary to the current state-of-the-art method that conditions on subgoal sequences, we show that it is more effective to achieve zero-shot generalization by solving these reach-avoid problems \textit{one subgoal at a time} through proper safe RL formulations. In addition, we introduce a novel subgoal-induced observation reduction technique that can mitigate the exponential complexity of subgoal-state combinations under realistic assumptions. Empirical results show that GenZ-LTL substantially outperforms existing methods in zero-shot generalization to unseen LTL specifications.

Paper Structure

This paper contains 20 sections, 6 equations, 16 figures, 7 tables, 2 algorithms.

Figures (16)

  • Figure 1: Overview of GenZ-LTL. During training, we sample reach-avoid subgoals $\sigma = (\alpha^+, A^-)$ and apply a subgoal-induced observation reduction to learn a subgoal-conditioned policy and value functions. At test time, given a target LTL formula $\varphi$, we construct the corresponding Büchi automaton and identify candidate subgoals based on the current automaton state. The optimal subgoal $\sigma^*$ is selected using the learned value functions, and the policy generates actions conditioned on $\sigma^*$.
  • Figure 2: Illustration of the subgoal-induced observation reduction for LiDAR observations. The environment provides the raw state consisting of a proposition-independent component $s_{\neg AP}$ and a proposition-dependent component $s_{AP}$. Given a subgoal $\sigma$, $s_{AP}$ is reduced to two vectors: a reach observation and a fused avoid observation. Together with the $s_{\neg AP}$, these form the agent’s input.
  • Figure 3: Environment illustrations of LetterWorld and ZoneEnv with example trajectories for $\neg\mathsf{l} \;\mathsf{U}\;\mathsf {j}$ and $\neg \mathsf{green} \;\mathsf{U}\; \mathsf{blue}$, respectively.
  • Figure 4: Performance in ZoneEnv under increasing subgoal sequence lengths, showing success rate $\eta_s$, violation rate $\eta_v$, and others rate $\eta_o$. We use reach-only specifications with sequence lengths $n \in \{4, 6, 8, 10, 12\}$ and reach-avoid specifications with $n \in \{2, 4, 6, 8, 10\}$. The specifications are shown in Table \ref{['tab:complexity-temporal']}. Each value is averaged over 5 seeds, with 100 trajectories per seed.
  • Figure 5: Performance in ZoneEnv under varying environment complexity, including different zone sizes and different numbers of zones per atomic proposition. Visualizations of the environment layout are provided in Figure \ref{['fig:vis-zone-size-count']}. We report success rate $\eta_s$, violation rate $\eta_v$, and other rate $\eta_o$. The default zone size is 0.4, and the default number of zones per atomic proposition is 2. The evaluated specifications are the reach-avoid specifications with $n=2$ shown in Table \ref{['tab:complexity-temporal']}.
  • ...and 11 more figures