Table of Contents
Fetching ...

On Sample-Efficient Generalized Planning via Learned Transition Models

Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava

TL;DR

The results show that learning explicit transition models yields higher out-of-distribution satisficing-plan success than direct action-sequence prediction in multiple domains, while achieving these gains with significantly fewer training instances and smaller models.

Abstract

Generalized planning studies the construction of solution strategies that generalize across families of planning problems sharing a common domain model, formally defined by a transition function $γ: S \times A \rightarrow S$. Classical approaches achieve such generalization through symbolic abstractions and explicit reasoning over $γ$. In contrast, recent Transformer-based planners, such as PlanGPT and Plansformer, largely cast generalized planning as direct action-sequence prediction, bypassing explicit transition modeling. While effective on in-distribution instances, these approaches typically require large datasets and model sizes, and often suffer from state drift in long-horizon settings due to the absence of explicit world-state evolution. In this work, we formulate generalized planning as a transition-model learning problem, in which a neural model explicitly approximates the successor-state function $\hatγ \approx γ$ and generates plans by rolling out symbolic state trajectories. Instead of predicting actions directly, the model autoregressively predicts intermediate world states, thereby learning the domain dynamics as an implicit world model. To study size-invariant generalization and sample efficiency, we systematically evaluate multiple state representations and neural architectures, including relational graph encodings. Our results show that learning explicit transition models yields higher out-of-distribution satisficing-plan success than direct action-sequence prediction in multiple domains, while achieving these gains with significantly fewer training instances and smaller models. This is an extended version of a short paper accepted at ICAPS 2026 under the same title.

On Sample-Efficient Generalized Planning via Learned Transition Models

TL;DR

The results show that learning explicit transition models yields higher out-of-distribution satisficing-plan success than direct action-sequence prediction in multiple domains, while achieving these gains with significantly fewer training instances and smaller models.

Abstract

Generalized planning studies the construction of solution strategies that generalize across families of planning problems sharing a common domain model, formally defined by a transition function . Classical approaches achieve such generalization through symbolic abstractions and explicit reasoning over . In contrast, recent Transformer-based planners, such as PlanGPT and Plansformer, largely cast generalized planning as direct action-sequence prediction, bypassing explicit transition modeling. While effective on in-distribution instances, these approaches typically require large datasets and model sizes, and often suffer from state drift in long-horizon settings due to the absence of explicit world-state evolution. In this work, we formulate generalized planning as a transition-model learning problem, in which a neural model explicitly approximates the successor-state function and generates plans by rolling out symbolic state trajectories. Instead of predicting actions directly, the model autoregressively predicts intermediate world states, thereby learning the domain dynamics as an implicit world model. To study size-invariant generalization and sample efficiency, we systematically evaluate multiple state representations and neural architectures, including relational graph encodings. Our results show that learning explicit transition models yields higher out-of-distribution satisficing-plan success than direct action-sequence prediction in multiple domains, while achieving these gains with significantly fewer training instances and smaller models. This is an extended version of a short paper accepted at ICAPS 2026 under the same title.
Paper Structure (85 sections, 10 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 85 sections, 10 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: State-Centric Generalized Planning Pipeline. From a symbolic planning instance $\Pi$, executable plans are generated using a learned transition model. (1) State Encoding: Symbolic state--goal pairs $(s_t, g)$ are mapped to fixed-dimensional embeddings $\phi(s_t)$ using either WL graph kernels or fixed-size factored vectors. (2) Transition Modeling: A parametric model (LSTM) or a non-parametric model (XGBoost) learns residual state transitions $\Delta_t$ to predict successor embeddings. (3) Neuro-Symbolic Plan Decoding: The predicted successor embedding $\hat{\phi}(s_{t+1})$ is matched against all valid symbolic successors $\mathrm{Succ}(s_t)$ induced by $\gamma$, and the nearest valid successor is selected to recover the executable action. This guarantees symbolic validity while enabling transition-model-based generalization.
  • Figure 2: Satisficing-plan success rates on the validation split across all domains, comparing PlanGPT, SATr, WL-based, and FSF baselines.
  • Figure 3: Satisficing-plan success rates on the interpolation split, comparing PlanGPT, SATr, WL-based, and FSF baselines for generalization to in-distribution problem instances.
  • Figure 4: Satisficing-plan success rates on the extrapolation split, , comparing PlanGPT, SATr, WL-based, and FSF baselines for generalization in out-of-distribution problem instances.