Reinforcement Learning of Flexible Policies for Symbolic Instructions with Adjustable Mapping Specifications
Wataru Hatanaka, Ryota Yamashina, Takamitsu Matsubara
TL;DR
This work tackles the problem of learning policies that follow symbolic instructions when the mapping from environment states to symbols is adjustable rather than fixed. It introduces SIAMS, which integrates specification-aware state modulation via FiLM and a symbol-number-based curriculum to learn diverse LTL-driven tasks without re-training for new mappings. By encoding LTL tasks with a graph neural task encoder and conditioning state embeddings on mapping specifications, SIAMS achieves superior performance over context-aware multitask RL baselines in both discrete navigation and continuous robotic inspection tasks. The results demonstrate that adjusting symbol mappings and progressively increasing task difficulty enable robust, scalable policy learning for flexible symbolic instructions with practical robotics implications.
Abstract
Symbolic task representation is a powerful tool for encoding human instructions and domain knowledge. Such instructions guide robots to accomplish diverse objectives and meet constraints through reinforcement learning (RL). Most existing methods are based on fixed mappings from environmental states to symbols. However, in inspection tasks, where equipment conditions must be evaluated from multiple perspectives to avoid errors of oversight, robots must fulfill the same symbol from different states. To help robots respond to flexible symbol mapping, we propose representing symbols and their mapping specifications separately within an RL policy. This approach imposes on RL policy to learn combinations of symbolic instructions and mapping specifications, requiring an efficient learning framework. To cope with this issue, we introduce an approach for learning flexible policies called Symbolic Instructions with Adjustable Mapping Specifications (SIAMS). This paper represents symbolic instructions using linear temporal logic (LTL), a formal language that can be easily integrated into RL. Our method addresses the diversified completion patterns of instructions by (1) a specification-aware state modulation, which embeds differences in mapping specifications in state features, and (2) a symbol-number-based task curriculum, which gradually provides tasks according to the learning's progress. Evaluations in 3D simulations with discrete and continuous action spaces demonstrate that our method outperforms context-aware multitask RL comparisons.
