Table of Contents
Fetching ...

Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

Sangwoo Jeon, Juchul Shin, Gyeong-Tae Kim, YeonJe Cho, Seongwoo Kim

TL;DR

This work tackles the scalability bottlenecks of RL-based generalized planning in PDDL domains by introducing a sparse, goal-aware GNN representation. It combines sparse local graph connectivity, goal-aware node embeddings, and action embeddings optimized with PPO, complemented by curriculum learning to scale to large grid worlds. Empirical results on drone-inspired grid domains show improved memory efficiency, faster and more stable learning, and strong generalization to unseen, larger instances, with GBFS-GNN inference validating scalability to 25×25 grids. The findings suggest a practical path toward deploying RL-based generalized planners in realistic, large-scale symbolic domains and motivate future integration with BDI-based autonomous drone frameworks.

Abstract

Generalized planning using deep reinforcement learning (RL) combined with graph neural networks (GNNs) has shown promising results in various symbolic planning domains described by PDDL. However, existing approaches typically represent planning states as fully connected graphs, leading to a combinatorial explosion in edge information and substantial sparsity as problem scales grow, especially evident in large grid-based environments. This dense representation results in diluted node-level information, exponentially increases memory requirements, and ultimately makes learning infeasible for larger-scale problems. To address these challenges, we propose a sparse, goal-aware GNN representation that selectively encodes relevant local relationships and explicitly integrates spatial features related to the goal. We validate our approach by designing novel drone mission scenarios based on PDDL within a grid world, effectively simulating realistic mission execution environments. Our experimental results demonstrate that our method scales effectively to larger grid sizes previously infeasible with dense graph representations and substantially improves policy generalization and success rates. Our findings provide a practical foundation for addressing realistic, large-scale generalized planning tasks.

Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

TL;DR

This work tackles the scalability bottlenecks of RL-based generalized planning in PDDL domains by introducing a sparse, goal-aware GNN representation. It combines sparse local graph connectivity, goal-aware node embeddings, and action embeddings optimized with PPO, complemented by curriculum learning to scale to large grid worlds. Empirical results on drone-inspired grid domains show improved memory efficiency, faster and more stable learning, and strong generalization to unseen, larger instances, with GBFS-GNN inference validating scalability to 25×25 grids. The findings suggest a practical path toward deploying RL-based generalized planners in realistic, large-scale symbolic domains and motivate future integration with BDI-based autonomous drone frameworks.

Abstract

Generalized planning using deep reinforcement learning (RL) combined with graph neural networks (GNNs) has shown promising results in various symbolic planning domains described by PDDL. However, existing approaches typically represent planning states as fully connected graphs, leading to a combinatorial explosion in edge information and substantial sparsity as problem scales grow, especially evident in large grid-based environments. This dense representation results in diluted node-level information, exponentially increases memory requirements, and ultimately makes learning infeasible for larger-scale problems. To address these challenges, we propose a sparse, goal-aware GNN representation that selectively encodes relevant local relationships and explicitly integrates spatial features related to the goal. We validate our approach by designing novel drone mission scenarios based on PDDL within a grid world, effectively simulating realistic mission execution environments. Our experimental results demonstrate that our method scales effectively to larger grid sizes previously infeasible with dense graph representations and substantially improves policy generalization and success rates. Our findings provide a practical foundation for addressing realistic, large-scale generalized planning tasks.

Paper Structure

This paper contains 32 sections, 8 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Illustration of the symbolic-to-graph transformation process for the BlocksWorld environment. The environment state (left) shows blocks A, B, and C in a specific configuration (current state), while the goal state (right) defines the target stack arrangement. This state is encoded as a graph (right), where nodes represent the blocks and the global feature “hand-empty” is represented as Global (G). Edges encode binary predicates such as “on” and “clear,” while node features capture unary predicates like “on-table” or “holding.” The resulting graph structure provides a compact and expressive representation of symbolic planning states, serving as input to the GNN-based reinforcement learning pipeline.
  • Figure 2: An overview of the proposed reinforcement learning pipeline for symbolic generalized planning. The environment state and goal are defined in PDDL and converted into a sparse graph representation consisting of nodes (objects), edges (binary predicates), and global features (nullary predicates). Each node embeds both its current state and goal-related features, enabling goal-aware relational reasoning. Applicable actions are extracted by the PDDL engine and encoded through their symbolic effects to form action embeddings. The embedded graph state and action representations are processed by a GNN-based policy network, which outputs the selected action. This action updates the graph, forming a closed reinforcement learning loop.
  • Figure 3: Comparison of Full and Sparse Graph configurations in terms of training success rate (left) and evaluation episode mean reward (right) in the 5×5 grid setting.
  • Figure 4: Training success rate (left) and evaluation episode mean reward (right) for 5×5 (top row) and 10×10 (bottom row) grids in Droneworld_simple.
  • Figure 5: Training success rate (left) and evaluation episode mean reward (right) for curriculum learning vs. random-sized training.
  • ...and 3 more figures