Table of Contents
Fetching ...

PegasusFlow: Parallel Rolling-Denoising Score Sampling for Robot Diffusion Planner Flow Matching

Lei Ye, Haibo Gao, Peng Xu, Zhelin Zhang, Junqi Shan, Ao Zhang, Wei Zhang, Ruyi Zhou, Zongquan Deng, Liang Ding

TL;DR

PegasusFlow introduces PegasusFlow, a hierarchical rolling-denoising framework that enables direct and parallel sampling of trajectory score gradients from environmental interaction, completely bypassing the need for expert data.

Abstract

Diffusion models offer powerful generative capabilities for robot trajectory planning, yet their practical deployment on robots is hindered by a critical bottleneck: a reliance on imitation learning from expert demonstrations. This paradigm is often impractical for specialized robots where data is scarce and creates an inefficient, theoretically suboptimal training pipeline. To overcome this, we introduce PegasusFlow, a hierarchical rolling-denoising framework that enables direct and parallel sampling of trajectory score gradients from environmental interaction, completely bypassing the need for expert data. Our core innovation is a novel sampling algorithm, Weighted Basis Function Optimization (WBFO), which leverages spline basis representations to achieve superior sample efficiency and faster convergence compared to traditional methods like MPPI. The framework is embedded within a scalable, asynchronous parallel simulation architecture that supports massively parallel rollouts for efficient data collection. Extensive experiments on trajectory optimization and robotic navigation tasks demonstrate that our approach, particularly Action-Value WBFO (AVWBFO) combined with a reinforcement learning warm-start, significantly outperforms baselines. In a challenging barrier-crossing task, our method achieved a 100% success rate and was 18% faster than the next-best method, validating its effectiveness for complex terrain locomotion planning. https://masteryip.github.io/pegasusflow.github.io/

PegasusFlow: Parallel Rolling-Denoising Score Sampling for Robot Diffusion Planner Flow Matching

TL;DR

PegasusFlow introduces PegasusFlow, a hierarchical rolling-denoising framework that enables direct and parallel sampling of trajectory score gradients from environmental interaction, completely bypassing the need for expert data.

Abstract

Diffusion models offer powerful generative capabilities for robot trajectory planning, yet their practical deployment on robots is hindered by a critical bottleneck: a reliance on imitation learning from expert demonstrations. This paradigm is often impractical for specialized robots where data is scarce and creates an inefficient, theoretically suboptimal training pipeline. To overcome this, we introduce PegasusFlow, a hierarchical rolling-denoising framework that enables direct and parallel sampling of trajectory score gradients from environmental interaction, completely bypassing the need for expert data. Our core innovation is a novel sampling algorithm, Weighted Basis Function Optimization (WBFO), which leverages spline basis representations to achieve superior sample efficiency and faster convergence compared to traditional methods like MPPI. The framework is embedded within a scalable, asynchronous parallel simulation architecture that supports massively parallel rollouts for efficient data collection. Extensive experiments on trajectory optimization and robotic navigation tasks demonstrate that our approach, particularly Action-Value WBFO (AVWBFO) combined with a reinforcement learning warm-start, significantly outperforms baselines. In a challenging barrier-crossing task, our method achieved a 100% success rate and was 18% faster than the next-best method, validating its effectiveness for complex terrain locomotion planning. https://masteryip.github.io/pegasusflow.github.io/

Paper Structure

This paper contains 23 sections, 15 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of PegasusFlow. (a) Visualization of the rollouts of an environments within a predict horizon. (b, c) The algorithm can optimize the robot control trajectory to navigate complex environments with barriers and gaps.
  • Figure 2: Framework of PegasusFlow. (a) RL style observations and rewards setup. (b) Rollout environments for trajectory score gradient sampling. (c) Noise sampling schema (Section \ref{['subsection:noise_sampling_schema']}). (d) WBFO for trajectory optimization (Section \ref{['subsection:wbfo']}) (e) Main environments that interact with simulation. (f) Sampled trajectory score gradient can be used for flow matching training of diffusion policy. (g) Optimal control problem and sampling-based score gradient estimation.
  • Figure 3: Weighted Basis Function Optimization. (a) Sampled action trajectories with step-wise rewards. (b) Basis functions of the bundle of the sampled trajectories. (c) Schematic diagram of the WBFO process, illustrating the mapping from trajectory rewards to node weights via basis functions.
  • Figure 4: Robotic tasks in PegasusFlow. (a) Hexapod timber piles navigation. (b) Franka arm collision avoidance planning. (c) Quadruped walking. (d) Hexapod confined space navigation. Videos can be found in the supplementary material or at https://masteryip.github.io/pegasusflow.github.io/ .
  • Figure 5: Optimization performance comparison. (a) 2D Navigation (trajectory planning problem). (b) Inverted Pendulum (optimal control problem). X-axis shows the number of samples used, Y-axis shows the final cost after optimization iterations.
  • ...and 2 more figures