Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Jing-Cheng Pang; Xin-Yu Yang; Si-Hang Yang; Yang Yu

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu

TL;DR

The paper introduces Inside-Out Learning (IOL) for natural language-conditioned reinforcement learning, replacing unbounded NL instructions with a task language (TL) expressed as object-predicate relations. It presents TALAR, a three-component system comprising a TL generator, an NL-to-TL translator (via a Variational Auto-Encoder), and an instruction-following policy trained with PPO, showing strong improvements and robustness to unseen NL expressions. TL serves not only to accelerate policy learning but also as a natural abstraction for hierarchical RL. The work demonstrates TL’s interpretability, enables better generalization, and provides a foundation for future dynamic dataset expansion and predicate-focused language representations.

Abstract

Natural Language-conditioned reinforcement learning (RL) enables the agents to follow human instructions. Previous approaches generally implemented language-conditioned RL by providing human instructions in natural language (NL) and training a following policy. In this outside-in approach, the policy needs to comprehend the NL and manage the task simultaneously. However, the unbounded NL examples often bring much extra complexity for solving concrete RL tasks, which can distract policy learning from completing the task. To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique. The TL is used in RL to achieve highly efficient and effective policy training. Besides, a translator is trained to translate NL into TL. We implement this scheme as TALAR (TAsk Language with predicAte Representation) that learns multiple predicates to model object relationships as the TL. Experiments indicate that TALAR not only better comprehends NL instructions but also leads to a better instruction-following policy that improves 13.4% success rate and adapts to unseen expressions of NL instruction. The TL can also be an effective task abstraction, naturally compatible with hierarchical RL.

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

TL;DR

Abstract

Paper Structure (28 sections, 3 equations, 15 figures, 5 tables, 3 algorithms)

This paper contains 28 sections, 3 equations, 15 figures, 5 tables, 3 algorithms.

Introduction
Related Work
Background
Method
TL Generation in Predicate Representation
NL Translation by Recovering TL
Policy Training With Reinforcement Learning
Experiments
Task Language Development and Translation
Performance of Instruction-Following Policy
TL as an Abstraction for Hierarchical RL
Ablation Study
Conclusion
Discussion
What Is the Predicate Representation?
...and 13 more sections

Figures (15)

Figure 1: An illustration of OIL and IOL schemes in NLC-RL. Left: OIL directly exposes the NL instructions to the policy. Right: IOL develops a task language, which is task-related and a unique representation of NL instructions. The solid lines represent instruction following process, while the dashed lines represent TL development and translation.
Figure 2: Overall training process of task language development and translation. (a) The overall training process. (b) Network architecture of the TL generator. (c) Architecture of one predicate module. (d) Network architecture of the translator. The number of predicate modules, arguments and predicate networks can be adjusted according to the task scale.
Figure 3: A visualization of CLEVR-Robot environment in our experiments. (a) In the beginning, one NL instruction is randomly sampled as Can you move the cyan ball in front of the blue ball? Then agent executes actions to complete the instruction. (b) The task terminates if achieving the goal or reaching the maximum timestep.
Figure 4: The t-SNE representations of different types of NL encoding. Points with the same marker stand for the encoding of nine different NL expressions that describe the same human instruction. We add a slight noise to the overlapping points for better presentation. (a) The t-SNE representations of the TL output by the translator. (b) The encoding output by Bert model. (c) The encoding output by the language encoding layer of the OIL baseline (Bert-continuous in Section \ref{['sec:exp_ifp']}).
Figure 5: Frequency of five destination balls when a predicate network outputs a value of $1$. Each bar stands for the frequency of the ball with a certain colour.
...and 10 more figures

Theorems & Definitions (1)

Definition 1: Task dataset

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

TL;DR

Abstract

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (1)