Table of Contents
Fetching ...

XFlowMP: Task-Conditioned Motion Fields for Generative Robot Planning with Schrodinger Bridges

Khang Nguyen, Minh Nhat Vu

TL;DR

XFlowMP introduces a task-conditioned generative motion planner based on Schrödinger bridges that transports noise to expert demonstrations while conditioning on start–goal configurations. By coupling high-order velocity, acceleration, and jerk fields with score-based guidance, it yields collision-free, dynamically-feasible trajectories with strong task adaptability. Empirical results on RobotPointMass, LASA Handwriting, and real Kinova Gen3 experiments show improved distributional alignment (lower MMD), smoother trajectories, and reduced energy, alongside reliable planning feasibility and competitive inference times. The work demonstrates a scalable framework for integrating task semantics into low-level motion generation with practical impact for real-world robotic planning.

Abstract

Generative robotic motion planning requires not only the synthesis of smooth and collision-free trajectories but also feasibility across diverse tasks and dynamic constraints. Prior planning methods, both traditional and generative, often struggle to incorporate high-level semantics with low-level constraints, especially the nexus between task configurations and motion controllability. In this work, we present XFlowMP, a task-conditioned generative motion planner that models robot trajectory evolution as entropic flows bridging stochastic noises and expert demonstrations via Schrodinger bridges given the inquiry task configuration. Specifically, our method leverages Schrodinger bridges as a conditional flow matching coupled with a score function to learn motion fields with high-order dynamics while encoding start-goal configurations, enabling the generation of collision-free and dynamically-feasible motions. Through evaluations, XFlowMP achieves up to 53.79% lower maximum mean discrepancy, 36.36% smoother motions, and 39.88% lower energy consumption while comparing to the next-best baseline on the RobotPointMass benchmark, and also reducing short-horizon planning time by 11.72%. On long-horizon motions in the LASA Handwriting dataset, our method maintains the trajectories with 1.26% lower maximum mean discrepancy, 3.96% smoother, and 31.97% lower energy. We further demonstrate the practicality of our method on the Kinova Gen3 manipulator, executing planning motions and confirming its robustness in real-world settings.

XFlowMP: Task-Conditioned Motion Fields for Generative Robot Planning with Schrodinger Bridges

TL;DR

XFlowMP introduces a task-conditioned generative motion planner based on Schrödinger bridges that transports noise to expert demonstrations while conditioning on start–goal configurations. By coupling high-order velocity, acceleration, and jerk fields with score-based guidance, it yields collision-free, dynamically-feasible trajectories with strong task adaptability. Empirical results on RobotPointMass, LASA Handwriting, and real Kinova Gen3 experiments show improved distributional alignment (lower MMD), smoother trajectories, and reduced energy, alongside reliable planning feasibility and competitive inference times. The work demonstrates a scalable framework for integrating task semantics into low-level motion generation with practical impact for real-world robotic planning.

Abstract

Generative robotic motion planning requires not only the synthesis of smooth and collision-free trajectories but also feasibility across diverse tasks and dynamic constraints. Prior planning methods, both traditional and generative, often struggle to incorporate high-level semantics with low-level constraints, especially the nexus between task configurations and motion controllability. In this work, we present XFlowMP, a task-conditioned generative motion planner that models robot trajectory evolution as entropic flows bridging stochastic noises and expert demonstrations via Schrodinger bridges given the inquiry task configuration. Specifically, our method leverages Schrodinger bridges as a conditional flow matching coupled with a score function to learn motion fields with high-order dynamics while encoding start-goal configurations, enabling the generation of collision-free and dynamically-feasible motions. Through evaluations, XFlowMP achieves up to 53.79% lower maximum mean discrepancy, 36.36% smoother motions, and 39.88% lower energy consumption while comparing to the next-best baseline on the RobotPointMass benchmark, and also reducing short-horizon planning time by 11.72%. On long-horizon motions in the LASA Handwriting dataset, our method maintains the trajectories with 1.26% lower maximum mean discrepancy, 3.96% smoother, and 31.97% lower energy. We further demonstrate the practicality of our method on the Kinova Gen3 manipulator, executing planning motions and confirming its robustness in real-world settings.

Paper Structure

This paper contains 24 sections, 19 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview with $\mathbf{X}$FlowMP: The task-conditioned motion planner, $\Upsilon_{\theta}$, learns from diverse experts to generate trajectories based on task inputs. From an initial noise distribution $\pi_{0}$, the learned model with Schrödinger bridges is able to produce multiple valid solutions $\pi_{1}^{(i)}$, while being initialization-free, contextualized, dynamically-feasible, collision-free, and scalable. Thus, the generated trajectories can be executed on robots across a variety of tasks.
  • Figure 2: Methodology of $\mathbf{X}$FlowMP: Given expert demonstrations of motions in the maze environment, $\mathrm{X}$FlowMP contextualizes those motions by their start-goal pairs (●, ✖) as tasks. During training, the learnable parameters transport initial noise distributions to match expert trajectories through Schrödinger bridges, ensuring that the generated trajectories remain consistent with task inputs. In specific, the motion field, comprising velocity, acceleration, and jerk parameters, learns to generate high-order motions along the evolution horizon from $0$ to $1$. Meanwhile, the score function estimates the underlying task-dependent gradients that guide the flows toward expert distributions. The corresponding velocity (blue and orange) and acceleration (green and yellow) profiles of the paths are shown accordingly. At $t = 1.00$, $\mathrm{X}$FlowMP generates motions that are collision-free and dynamically-feasible while maintaining semantic consistency with task conditions.
  • Figure 3: Qualitative Comparisons of Trajectories Generated by Baselines and $\mathbf{X}$FlowMP:(a) In the RobotPointMass environment, $\mathrm{X}$FlowMP produces smooth, collision-free paths that closely follow expert demonstrations, while baselines often yield discontinuous or collision-prone trajectories. (b) On the LASA Handwriting dataset, $\mathrm{X}$FlowMP captures the structure of complex patterns, aligning well with expert motions. In contrast, baseline methods tend to generate incorrect/distorted shapes or fail to preserve curvature. Generally, $\mathrm{X}$FlowMP exhibits a strong ability to contextualize while producing smooth, collision-free, and expert-like trajectories.
  • Figure 4: Real-Robot Demonstration: Using $\mathrm{X}$FlowMP, the Kinova Gen3 manipulator is able to generate smooth, collision-free, and dynamically-feasible motions to reach the LEGO block on the shelf, given the positions of the robot and the target object as the task.