Table of Contents
Fetching ...

Incorporating Spatial Information into Goal-Conditioned Hierarchical Reinforcement Learning via Graph Representations

Shuyuan Zhang, Zihan Wang, Xiao-Wen Chang, Doina Precup

TL;DR

The paper tackles the inefficiency of goal-conditioned hierarchical RL by introducing G4RL, which embeds spatial information through a graph encoder–decoder and an online state graph. By constructing and updating a state graph during exploration, and learning subgoal representations that respect connectivity, the approach provides intrinsic rewards at both the high and low levels to guide exploration and execution. The method is designed to be compatible with existing GCHRL algorithms and shows substantial improvements in convergence speed and success rates across dense and sparse reward Ant environments, including image-based state experiments. While effective in symmetric and reversible transition settings, the work also investigates adaptive training and speed-accuracy trade-offs and highlights potential future work on automatic hyperparameter tuning and transfer of graph knowledge to new tasks.

Abstract

The integration of graphs with Goal-conditioned Hierarchical Reinforcement Learning (GCHRL) has recently gained attention, as intermediate goals (subgoals) can be effectively sampled from graphs that naturally represent the overall task structure in most RL tasks. However, existing approaches typically rely on domain-specific knowledge to construct these graphs, limiting their applicability to new tasks. Other graph-based approaches create graphs dynamically during exploration but struggle to fully utilize them, because they have problems passing the information in the graphs to newly visited states. Additionally, current GCHRL methods face challenges such as sample inefficiency and poor subgoal representation. This paper proposes a solution to these issues by developing a graph encoder-decoder to evaluate unseen states. Our proposed method, Graph-Guided sub-Goal representation Generation RL (G4RL), can be incorporated into any existing GCHRL method when operating in environments with primarily symmetric and reversible transitions to enhance performance across this class of problems. We show that the graph encoder-decoder can be effectively implemented using a network trained on the state graph generated during exploration. Empirical results indicate that leveraging high and low-level intrinsic rewards from the graph encoder-decoder significantly enhances the performance of state-of-the-art GCHRL approaches with an extra small computational cost in dense and sparse reward environments.

Incorporating Spatial Information into Goal-Conditioned Hierarchical Reinforcement Learning via Graph Representations

TL;DR

The paper tackles the inefficiency of goal-conditioned hierarchical RL by introducing G4RL, which embeds spatial information through a graph encoder–decoder and an online state graph. By constructing and updating a state graph during exploration, and learning subgoal representations that respect connectivity, the approach provides intrinsic rewards at both the high and low levels to guide exploration and execution. The method is designed to be compatible with existing GCHRL algorithms and shows substantial improvements in convergence speed and success rates across dense and sparse reward Ant environments, including image-based state experiments. While effective in symmetric and reversible transition settings, the work also investigates adaptive training and speed-accuracy trade-offs and highlights potential future work on automatic hyperparameter tuning and transfer of graph knowledge to new tasks.

Abstract

The integration of graphs with Goal-conditioned Hierarchical Reinforcement Learning (GCHRL) has recently gained attention, as intermediate goals (subgoals) can be effectively sampled from graphs that naturally represent the overall task structure in most RL tasks. However, existing approaches typically rely on domain-specific knowledge to construct these graphs, limiting their applicability to new tasks. Other graph-based approaches create graphs dynamically during exploration but struggle to fully utilize them, because they have problems passing the information in the graphs to newly visited states. Additionally, current GCHRL methods face challenges such as sample inefficiency and poor subgoal representation. This paper proposes a solution to these issues by developing a graph encoder-decoder to evaluate unseen states. Our proposed method, Graph-Guided sub-Goal representation Generation RL (G4RL), can be incorporated into any existing GCHRL method when operating in environments with primarily symmetric and reversible transitions to enhance performance across this class of problems. We show that the graph encoder-decoder can be effectively implemented using a network trained on the state graph generated during exploration. Empirical results indicate that leveraging high and low-level intrinsic rewards from the graph encoder-decoder significantly enhances the performance of state-of-the-art GCHRL approaches with an extra small computational cost in dense and sparse reward environments.

Paper Structure

This paper contains 30 sections, 10 equations, 19 figures, 2 tables, 1 algorithm.

Figures (19)

  • Figure 1: Success Rate on (a) AntMaze (b) AntMaze-Sparse and Reward on (c) AntGather, using HIRO, HIRO-G4RL, HRAC, HRAC-G4RL, and TD3. Incorporating G4RL in HIRO and HRAC significantly enhances their performance.
  • Figure 2: Success Rate on (a) AntMaze (b) AntMaze-Sparse and (c) AntPush, using HESS, HESS-G4RL, HLPS, HLPS-G4RL. Incorporating G4RL in HESS and HLPS significantly enhances their performance.
  • Figure 3: Success Rate on (a) AntMaze (b) AntPush and (c) AntFall with image state features, using HESS, HESS-G4RL, HLPS, HLPS-G4RL. Incorporating G4RL helps convergence and achieves higher performance across all tested image-based environments.
  • Figure 4: Success Rate on (a) AntMaze (b) AntMaze-Sparse and Reward on (c) AntGather using HIRO-G4RL, HIRO + High-level intrinsic, HIRO + Low-level intrinsic and HIRO. All curves have been equally smoothed for better visualization. The combination of high-level and low-level intrinsic rewards results in the highest success rates and fastest convergence.
  • Figure 5: Success Rate on (a) AntMaze (b) AntMaze-Sparse and Reward on (c) AntGather using HRAC-G4RL, HRAC + High-level intrinsic, HRAC + Low-level intrinsic and HRAC. All curves have been smoothed equally for better visualization. The combination of high-level and low-level intrinsic rewards results in the highest success rates and fastest convergence.
  • ...and 14 more figures