Table of Contents
Fetching ...

Reinforcement Learning of Flexible Policies for Symbolic Instructions with Adjustable Mapping Specifications

Wataru Hatanaka, Ryota Yamashina, Takamitsu Matsubara

TL;DR

This work tackles the problem of learning policies that follow symbolic instructions when the mapping from environment states to symbols is adjustable rather than fixed. It introduces SIAMS, which integrates specification-aware state modulation via FiLM and a symbol-number-based curriculum to learn diverse LTL-driven tasks without re-training for new mappings. By encoding LTL tasks with a graph neural task encoder and conditioning state embeddings on mapping specifications, SIAMS achieves superior performance over context-aware multitask RL baselines in both discrete navigation and continuous robotic inspection tasks. The results demonstrate that adjusting symbol mappings and progressively increasing task difficulty enable robust, scalable policy learning for flexible symbolic instructions with practical robotics implications.

Abstract

Symbolic task representation is a powerful tool for encoding human instructions and domain knowledge. Such instructions guide robots to accomplish diverse objectives and meet constraints through reinforcement learning (RL). Most existing methods are based on fixed mappings from environmental states to symbols. However, in inspection tasks, where equipment conditions must be evaluated from multiple perspectives to avoid errors of oversight, robots must fulfill the same symbol from different states. To help robots respond to flexible symbol mapping, we propose representing symbols and their mapping specifications separately within an RL policy. This approach imposes on RL policy to learn combinations of symbolic instructions and mapping specifications, requiring an efficient learning framework. To cope with this issue, we introduce an approach for learning flexible policies called Symbolic Instructions with Adjustable Mapping Specifications (SIAMS). This paper represents symbolic instructions using linear temporal logic (LTL), a formal language that can be easily integrated into RL. Our method addresses the diversified completion patterns of instructions by (1) a specification-aware state modulation, which embeds differences in mapping specifications in state features, and (2) a symbol-number-based task curriculum, which gradually provides tasks according to the learning's progress. Evaluations in 3D simulations with discrete and continuous action spaces demonstrate that our method outperforms context-aware multitask RL comparisons.

Reinforcement Learning of Flexible Policies for Symbolic Instructions with Adjustable Mapping Specifications

TL;DR

This work tackles the problem of learning policies that follow symbolic instructions when the mapping from environment states to symbols is adjustable rather than fixed. It introduces SIAMS, which integrates specification-aware state modulation via FiLM and a symbol-number-based curriculum to learn diverse LTL-driven tasks without re-training for new mappings. By encoding LTL tasks with a graph neural task encoder and conditioning state embeddings on mapping specifications, SIAMS achieves superior performance over context-aware multitask RL baselines in both discrete navigation and continuous robotic inspection tasks. The results demonstrate that adjusting symbol mappings and progressively increasing task difficulty enable robust, scalable policy learning for flexible symbolic instructions with practical robotics implications.

Abstract

Symbolic task representation is a powerful tool for encoding human instructions and domain knowledge. Such instructions guide robots to accomplish diverse objectives and meet constraints through reinforcement learning (RL). Most existing methods are based on fixed mappings from environmental states to symbols. However, in inspection tasks, where equipment conditions must be evaluated from multiple perspectives to avoid errors of oversight, robots must fulfill the same symbol from different states. To help robots respond to flexible symbol mapping, we propose representing symbols and their mapping specifications separately within an RL policy. This approach imposes on RL policy to learn combinations of symbolic instructions and mapping specifications, requiring an efficient learning framework. To cope with this issue, we introduce an approach for learning flexible policies called Symbolic Instructions with Adjustable Mapping Specifications (SIAMS). This paper represents symbolic instructions using linear temporal logic (LTL), a formal language that can be easily integrated into RL. Our method addresses the diversified completion patterns of instructions by (1) a specification-aware state modulation, which embeds differences in mapping specifications in state features, and (2) a symbol-number-based task curriculum, which gradually provides tasks according to the learning's progress. Evaluations in 3D simulations with discrete and continuous action spaces demonstrate that our method outperforms context-aware multitask RL comparisons.

Paper Structure

This paper contains 30 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of our method: By introducing a symbol mapping specification, a policy learns to follow symbol instructions and satisfaction criteria for each symbol. Our method allows robots to perform such sequential tasks as inspections that require checking from different perspectives based on requested level of accuracy.
  • Figure 2: Comparison of specification-aware symbol mapping with fixed mapping: Our symbol mapping allows different states to satisfy identical symbols based on instructed mapping specifications.
  • Figure 3: Overview of SIAMS framework: For simplicity, we describe LTL and mapping specifications as LTL tasks with mapping specifications based on LTL’s formulation. When next symbol of a given LTL task is $\mathtt{read\_meter}$, state embedding $\mathcal{E}_{state}$ is modulated by affine parameters, $\alpha_{c_{read}}$ and $\beta_{c_{read}}$, conditioned by mapping specification $c_{read}$, and is passed to the policy along with task embedding $\mathcal{E}_{task}$. All encoders and policy $\pi$ are optimized through RL by reward defined in Eq. \ref{['eq:reward_ltl2action']}.
  • Figure 4: Environment of navigation task: Map has red and blue boxes and letter "LE" on left wall and "RI" on right one. Symbol mapping specifications for letter symbols are defined by distance $d$ and angle $\theta$ from position of letters and radius $r_d$ of detectable area.
  • Figure 5: Evaluation results: They are averaged over three random seeds for both navigation and robotic inspection tasks. Each plot is arranged by curriculum level, and the x-axis shows the number of steps. Brightness of square color indicates the average success rate at that step. We periodically evaluate the training policy using a different seed than training, running 20 episodes for each task available at the current level, and report the average success rate.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1: Taskable MDP
  • Definition 2