Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning

Chak Lam Shek; Pratap Tokekar

Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning

Chak Lam Shek, Pratap Tokekar

TL;DR

The paper tackles the difficulty of exploration and generalization in long-horizon robotic tasks by integrating large language model reasoning into a semantic hierarchical RL framework, LDSC. It introduces a three-level policy stack (subgoal, option, action) guided by an LLM that generates subgoals and constructs reusable option trees, enabling efficient planning and skill transfer across tasks. Empirical results in Mujoco show substantial gains in average reward (≈55.9%), faster completion (≈53.1%), and higher success rates (≈72.7%) without extra training time, demonstrating robust performance in multi-task environments. This work highlights the potential of combining semantic reasoning with hierarchical control to produce scalable, transferable policies for complex robotics tasks.

Abstract

Large Language Models (LLMs) have shown remarkable promise in reasoning and decision-making, yet their integration with Reinforcement Learning (RL) for complex robotic tasks remains underexplored. In this paper, we propose an LLM-guided hierarchical RL framework, termed LDSC, that leverages LLM-driven subgoal selection and option reuse to enhance sample efficiency, generalization, and multi-task adaptability. Traditional RL methods often suffer from inefficient exploration and high computational cost. Hierarchical RL helps with these challenges, but existing methods often fail to reuse options effectively when faced with new tasks. To address these limitations, we introduce a three-stage framework that uses LLMs for subgoal generation given natural language description of the task, a reusable option learning and selection method, and an action-level policy, enabling more effective decision-making across diverse tasks. By incorporating LLMs for subgoal prediction and policy guidance, our approach improves exploration efficiency and enhances learning performance. On average, LDSC outperforms the baseline by 55.9\% in average reward, demonstrating its effectiveness in complex RL settings. More details and experiment videos could be found in \href{https://raaslab.org/projects/LDSC/}{this link\footnote{https://raaslab.org/projects/LDSC}}.

Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning

TL;DR

Abstract

Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)