Table of Contents
Fetching ...

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

TL;DR

Strict Subgoal Execution (SSE) is proposed, a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making.

Abstract

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, their reliance on conventional hindsight relabeling often fails to correct subgoal infeasibility, leading to inefficient high-level planning. To address this, we propose Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making. FER delineates the reachability frontier using failure and partial-success transitions, which identifies unreliable subgoals, increases subgoal reliability, and reduces unnecessary high-level decisions. Additionally, SSE employs a decoupled exploration policy to cover underexplored regions of the goal space and a path refinement that adjusts edge costs using observed low-level failures. Experimental results across diverse long-horizon benchmarks show that SSE consistently outperforms existing goal-conditioned and hierarchical RL methods in both efficiency and success rate. Our code is available at https://github.com/Jaebak1996/SSE

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

TL;DR

Strict Subgoal Execution (SSE) is proposed, a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making.

Abstract

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, their reliance on conventional hindsight relabeling often fails to correct subgoal infeasibility, leading to inefficient high-level planning. To address this, we propose Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making. FER delineates the reachability frontier using failure and partial-success transitions, which identifies unreliable subgoals, increases subgoal reliability, and reduces unnecessary high-level decisions. Additionally, SSE employs a decoupled exploration policy to cover underexplored regions of the goal space and a path refinement that adjusts edge costs using observed low-level failures. Experimental results across diverse long-horizon benchmarks show that SSE consistently outperforms existing goal-conditioned and hierarchical RL methods in both efficiency and success rate. Our code is available at https://github.com/Jaebak1996/SSE

Paper Structure

This paper contains 33 sections, 13 equations, 24 figures, 4 tables, 2 algorithms.

Figures (24)

  • Figure 1: Agent trajectories in goal space $\mathcal{G}$. (a) Conventional HRL with HER relabels intermediate states as subgoals without enforcing exact subgoal completion, which lengthens high-level trajectories. (b) SSE with FER enforces exact subgoal completion, increasing subgoal reliability and reducing unnecessary high-level decisions, thereby improving learning efficiency. (c) After training, SSE reaches $g$ with few high-level steps, here in a single step in single-goal settings even from distant starts. Agent locations are $\phi(s_t)\in\mathcal{G}$ and $t_i$ is the $i$-th high-level step.
  • Figure 2: Initial subgoals at $t = 0$ selected by $\pi^h$ and $\pi^{\mathrm{exp}}$, with corresponding Ant agent trajectories at (a) early, (b) intermediate, and (c) final training stages in the U-maze task. The goal space (agent positions in the map) is partitioned into grid cells $C_\mathcal{G}^m$. $\pi^h$ selects between $\tilde{g}_{\max}$ and $\tilde{g}_{\mathrm{rand}}$ to encourage broad coverage, while $\pi^{\mathrm{exp}}$ samples from $\tilde{g}_{\mathrm{novel}}$, $\tilde{g}_{\max}$, and $g$ to visit underexplored regions and the goal. Over time, unreachable areas are excluded from subgoal candidates via SSE.
  • Figure 3: Comparison of agent trajectories (blue lines) in a map with a bottleneck: (a) without path refinement and (b) with the proposed path refinement. Green lines represent the shortest waypoint paths computed via Dijkstra’s algorithm, while red areas denote grid cells with high failure ratios, i.e., $\text{ratio}_{\text{fail}}(C^m_{\mathcal{G}}) > 0.05$.
  • Figure 4: The proposed SSE framework.
  • Figure 5: Considered long-horizon environments: 5 AntMaze, 2 KeyChest, and 2 Reacher tasks
  • ...and 19 more figures

Theorems & Definitions (1)

  • Definition 4.1: Frontier Experience Replay