Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

Eleftherios Triantafyllidis; Filippos Christianos; Zhibin Li

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

Eleftherios Triantafyllidis, Filippos Christianos, Zhibin Li

TL;DR

The proposed IGE-LLMs framework exhibits notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, can be combined and complement existing learning methods highlighting its modularity, and are fairly insensitive to different intrinsic scaling parameters.

Abstract

Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

TL;DR

Abstract

Paper Structure (28 sections, 6 figures)

This paper contains 28 sections, 6 figures.

INTRODUCTION
RELATED WORK
Reinforcement Learning -- Trials, Errors, and the Pursuit of Elusive Sparse Rewards
Intrinsic Guidance -- The Compass Navigating Through the Maze of Reinforcement Learning
Guiding Reinforcement Learning with LLMs
METHODOLOGY
Technical Preliminaries
Intrinsically Guided Exploration from LLMs (IGE-LLMs)
Apparatus and System Configuration
Extending the RObotic MAnipulation Network (ROMAN)
Main Environment
Internal Hybrid Learning Procedure
Expert NNs - States, Actions and Rewards
EVALUATION
Illustrating the idea on a Toy Environment -- DeepSea
...and 13 more sections

Figures (6)

Figure 1: Schematics illustrating the principles of our method. (A) The overview of IGE-LLMs. (B) IGE-LLMs on ROMAN's hierarchical architecture roman for solving complex robotic manipulation tasks entailing sparse rewards and long horizons.
Figure 2: The two environments employed for evaluating all methods in the chapter including the proposed IGE-LLMs framework. Figure A: The preliminary grid-based environment -- DeepSea. From left to right, increasing grid sizes of the DeepSea environment, notably increasing the dimensionality problem and by extent the degrees of exploration necessitated to achieve the goal. Colours of blue, green and red represent the agent, goal and traps respectively. B.1-B.4 represent grid sizes 8x8, 64x64, 128x128 and 192x192 respectively. DeepSea is challenged by exploration. Note on Visualisation for DeepSea: For the grid sizes 128x128 and 192x192, the agent virtual mesh size, depicted in blue colour, is increased four-fold for the purpose of enhanced visualisation. Figure B: The main environment, ROMAN, studying notably intricate robotic manipulation tasks, necessitating the correct sequential orchestration of a plethora of macro-actions. It is worthwhile to point out that all related methods, including IGE-LLMs, were applied to ROMAN's gating network directly. Consequently, the evaluation for the main robotic environment entailed how well the gating network is capable of inferring and orchestrating the different macro-actions in a robotic manipulation task challenged by both exploration and long-horizons.
Figure 3: IGE-LLMs on ROMAN's longest-horizon task, case seven, at $\sigma = \pm 0.5$cm noise, compared against other methods. For more details of the sequential failures exhibited by other methods please consult the video available at Triantafyllidis2024Intrinsic.
Figure 4: Normalised evaluation returns. Shading depicts the standard deviation ($\sigma$) around the mean. A DeepSea, from $n=5$ seeds for 8x8 and 64x64 and $n=3$ seeds for 128x128 and 192x192. B ROMAN from $n=5$ seeds. C Legend.
Figure 5: Inference results for the ROMAN environment across five distinct models. The x-axis represents the task horizon, ascending from left to right, y-axis the exteroceptive noise, ascending from top to bottom. Each cell stems from 1K episodes.
...and 1 more figures

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

TL;DR

Abstract

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)