Table of Contents
Fetching ...

State-Covering Trajectory Stitching for Diffusion Planners

Kyowoon Lee, Jaesik Choi

TL;DR

State-Covering Trajectory Stitching (SCoTS) addresses the data bottleneck of diffusion planners by generating reward-free, long-horizon trajectory augmentations that expand state coverage. It learns a temporal distance-preserving latent embedding to guide stitching of short segments, then iteratively selects segments balancing directional progress and novelty, followed by diffusion-based refinement to ensure dynamic consistency. Empirical results on OGBench and offline GCRL benchmarks show that SCoTS-augmented data markedly improves long-horizon planning and generalization, with ablations confirming the necessity of diffusion refinement and novelty-based exploration. The approach demonstrates the practical value of trajectory-level data augmentation for robust, scalable diffusion-based planning in complex environments, with implications for robotics and offline RL applications.

Abstract

Diffusion-based generative models are emerging as powerful tools for long-horizon planning in reinforcement learning (RL), particularly with offline datasets. However, their performance is fundamentally limited by the quality and diversity of training data. This often restricts their generalization to tasks outside their training distribution or longer planning horizons. To overcome this challenge, we propose State-Covering Trajectory Stitching (SCoTS), a novel reward-free trajectory augmentation method that incrementally stitches together short trajectory segments, systematically generating diverse and extended trajectories. SCoTS first learns a temporal distance-preserving latent representation that captures the underlying temporal structure of the environment, then iteratively stitches trajectory segments guided by directional exploration and novelty to effectively cover and expand this latent space. We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks requiring stitching and long-horizon reasoning. Furthermore, augmented trajectories generated by SCoTS significantly improve the performance of widely used offline goal-conditioned RL algorithms across diverse environments.

State-Covering Trajectory Stitching for Diffusion Planners

TL;DR

State-Covering Trajectory Stitching (SCoTS) addresses the data bottleneck of diffusion planners by generating reward-free, long-horizon trajectory augmentations that expand state coverage. It learns a temporal distance-preserving latent embedding to guide stitching of short segments, then iteratively selects segments balancing directional progress and novelty, followed by diffusion-based refinement to ensure dynamic consistency. Empirical results on OGBench and offline GCRL benchmarks show that SCoTS-augmented data markedly improves long-horizon planning and generalization, with ablations confirming the necessity of diffusion refinement and novelty-based exploration. The approach demonstrates the practical value of trajectory-level data augmentation for robust, scalable diffusion-based planning in complex environments, with implications for robotics and offline RL applications.

Abstract

Diffusion-based generative models are emerging as powerful tools for long-horizon planning in reinforcement learning (RL), particularly with offline datasets. However, their performance is fundamentally limited by the quality and diversity of training data. This often restricts their generalization to tasks outside their training distribution or longer planning horizons. To overcome this challenge, we propose State-Covering Trajectory Stitching (SCoTS), a novel reward-free trajectory augmentation method that incrementally stitches together short trajectory segments, systematically generating diverse and extended trajectories. SCoTS first learns a temporal distance-preserving latent representation that captures the underlying temporal structure of the environment, then iteratively stitches trajectory segments guided by directional exploration and novelty to effectively cover and expand this latent space. We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks requiring stitching and long-horizon reasoning. Furthermore, augmented trajectories generated by SCoTS significantly improve the performance of widely used offline goal-conditioned RL algorithms across diverse environments.

Paper Structure

This paper contains 42 sections, 14 equations, 14 figures, 12 tables, 1 algorithm.

Figures (14)

  • Figure 1: Improved generalization with SCoTS. (a) Examples from the training dataset, illustrating limited coverage. (b) Plans generated by Hierarchical Diffuser (HD) chen2024simple, which fail to generalize well to these out-of-distribution tasks due to insufficient coverage of the training data. (c) Plans generated by HD trained on SCoTS-augmented data, demonstrating significantly improved trajectory stitching capability and generalization to unseen tasks. Each color corresponds to one of 10 plans generated by the planner.
  • Figure 1: Quantitative results on locomotion tasks in OGBench. Results are averaged over 5 random seeds, each with 50 episodes per task. Standard deviations are reported after the $\pm$ sign.
  • Figure 2: Overview of the SCoTS stitching process. (a) Temporal Distance-Preserving Search: Given the currently composed trajectory (red), we identify candidate segments (gray) by searching in a latent space learned to preserve temporal distances. Candidates are selected based on proximity to the endpoint of the current trajectory in latent space. (b) Exploratory Segment Selection: Among the retrieved candidate segments, we select the segment (blue) that best balances directional progress toward a randomly sampled latent direction and novelty relative to previously visited states in latent space. (c) Diffusion-based Stitching Refinement: To ensure smooth transitions, a diffusion model refines the stitching point between segments, generating dynamically consistent trajectories.
  • Figure 3: Effect of novelty score on Trajectory Stitching. Trajectory stitching examples in the PointMaze-Giant-Stitch environment. The original dataset (Stitch) consists of short segments limited to at most four maze cells. Different colors represent trajectories generated from distinct latent exploration directions $z$.
  • Figure 4: SCoTS enables long-horizon planning. We visualize trajectories generated by a diffusion planner trained on SCoTS-augmented data, evaluated on two challenging AntMaze datasets: Explore (top) and Stitch (bottom). The original Stitch dataset contains trajectories limited to four maze cells per segment, necessitating extensive stitching, whereas the Explore dataset comprises low-quality trajectories with large action noise. Despite these constraints, SCoTS augmentation allows the planner to synthesize trajectories that substantially surpass the horizon and quality of the original data, connecting specified start and goal .
  • ...and 9 more figures