Grounding LTL Tasks in Sub-Symbolic RL Environments for Zero-Shot Generalization
Matteo Pannacci, Andrea Fanti, Elena Umili, Roberto Capobianco
TL;DR
This work tackles zero-shot generalization of temporally-extended tasks specified in Linear Temporal Logic (LTL) within sub-symbolic environments by jointly learning a multi-task policy and a symbol grounder using Neural Reward Machines (NRMs). The system combines grounder grounding from raw observations, LTL progression, automata-based task representations, and end-to-end reinforcement learning (PPO), allowing transfer to unseen formulas without explicit grounding labels. Across Minecraft-like and continuous FlatWorld domains, the method nearly matches the performance of models with known grounding and significantly outperforms prior sub-symbolic baselines, especially on partially-ordered task structures, while highlighting challenges in global avoidance tasks. The results demonstrate a practical pathway to deploy LTL-guided, multi-task RL in environments where symbol grounding is not directly observable, with implications for scalable instruction-following in robotics and intelligent agents.
Abstract
In this work we address the problem of training a Reinforcement Learning agent to follow multiple temporally-extended instructions expressed in Linear Temporal Logic in sub-symbolic environments. Previous multi-task work has mostly relied on knowledge of the mapping between raw observations and symbols appearing in the formulae. We drop this unrealistic assumption by jointly training a multi-task policy and a symbol grounder with the same experience. The symbol grounder is trained only from raw observations and sparse rewards via Neural Reward Machines in a semi-supervised fashion. Experiments on vision-based environments show that our method achieves performance comparable to using the true symbol grounding and significantly outperforms state-of-the-art methods for sub-symbolic environments.
