Table of Contents
Fetching ...

Grounding LTL Tasks in Sub-Symbolic RL Environments for Zero-Shot Generalization

Matteo Pannacci, Andrea Fanti, Elena Umili, Roberto Capobianco

TL;DR

This work tackles zero-shot generalization of temporally-extended tasks specified in Linear Temporal Logic (LTL) within sub-symbolic environments by jointly learning a multi-task policy and a symbol grounder using Neural Reward Machines (NRMs). The system combines grounder grounding from raw observations, LTL progression, automata-based task representations, and end-to-end reinforcement learning (PPO), allowing transfer to unseen formulas without explicit grounding labels. Across Minecraft-like and continuous FlatWorld domains, the method nearly matches the performance of models with known grounding and significantly outperforms prior sub-symbolic baselines, especially on partially-ordered task structures, while highlighting challenges in global avoidance tasks. The results demonstrate a practical pathway to deploy LTL-guided, multi-task RL in environments where symbol grounding is not directly observable, with implications for scalable instruction-following in robotics and intelligent agents.

Abstract

In this work we address the problem of training a Reinforcement Learning agent to follow multiple temporally-extended instructions expressed in Linear Temporal Logic in sub-symbolic environments. Previous multi-task work has mostly relied on knowledge of the mapping between raw observations and symbols appearing in the formulae. We drop this unrealistic assumption by jointly training a multi-task policy and a symbol grounder with the same experience. The symbol grounder is trained only from raw observations and sparse rewards via Neural Reward Machines in a semi-supervised fashion. Experiments on vision-based environments show that our method achieves performance comparable to using the true symbol grounding and significantly outperforms state-of-the-art methods for sub-symbolic environments.

Grounding LTL Tasks in Sub-Symbolic RL Environments for Zero-Shot Generalization

TL;DR

This work tackles zero-shot generalization of temporally-extended tasks specified in Linear Temporal Logic (LTL) within sub-symbolic environments by jointly learning a multi-task policy and a symbol grounder using Neural Reward Machines (NRMs). The system combines grounder grounding from raw observations, LTL progression, automata-based task representations, and end-to-end reinforcement learning (PPO), allowing transfer to unseen formulas without explicit grounding labels. Across Minecraft-like and continuous FlatWorld domains, the method nearly matches the performance of models with known grounding and significantly outperforms prior sub-symbolic baselines, especially on partially-ordered task structures, while highlighting challenges in global avoidance tasks. The results demonstrate a practical pathway to deploy LTL-guided, multi-task RL in environments where symbol grounding is not directly observable, with implications for scalable instruction-following in robotics and intelligent agents.

Abstract

In this work we address the problem of training a Reinforcement Learning agent to follow multiple temporally-extended instructions expressed in Linear Temporal Logic in sub-symbolic environments. Previous multi-task work has mostly relied on knowledge of the mapping between raw observations and symbols appearing in the formulae. We drop this unrealistic assumption by jointly training a multi-task policy and a symbol grounder with the same experience. The symbol grounder is trained only from raw observations and sparse rewards via Neural Reward Machines in a semi-supervised fashion. Experiments on vision-based environments show that our method achieves performance comparable to using the true symbol grounding and significantly outperforms state-of-the-art methods for sub-symbolic environments.
Paper Structure (38 sections, 10 equations, 7 figures, 6 tables)

This paper contains 38 sections, 10 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Environment visualization and Moore Machine corresponding to the task $\neg lava \mathbin{\mathcal{U}} (egg \wedge (\neg lava \mathbin{\mathcal{U}} (pick \wedge (\neg lava \mathbin{\mathcal{U}} door))))$, which is the (co-safe) task of following the sequence $egg \rightarrow pick \rightarrow door$ without traversing $lava$ in the meanwhile. In blue is represented a trace of execution leading to success, while in red a trace of execution leading to failure.
  • Figure 2: (a) Unfolded computational graph of the grounder training through the LTL task's Neural Reward Machine, employing backpropagation through time. $q\textsuperscript{$i$}$ denotes the initial state of the NRM and $\tilde{q}\textsuperscript{($t$)}$ denotes the predicted state at time $t$. (b) Overview of the RL framework. (c) The LTL goal represented as a formula and as the corresponding graph derived from its AST.
  • Figure 3: Comparison between our method (in blue), the baseline (in orange) and LTL2Action (with known symbol grounding) (in green). We report the evolution of the discounted return and, for our method, the grounder accuracy (averaged over 5 seeds, with std error bands).
  • Figure 4: Example of the image observations of the Minecraft-like environment following the egocentric view. The agent position is represented by the red square icon.
  • Figure 5: Example of the image observations of the FlatWorld environment. The agent position is represented by the black dot.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1