Motion Manifold Flow Primitives for Task-Conditioned Trajectory Generation under Complex Task-Motion Dependencies
Yonghyeon Lee, Byeongho Lee, Seungyeon Kim, Frank C. Park
TL;DR
The paper tackles the challenge of generating diverse, task-conditioned trajectories from limited demonstrations, especially under complex task-motion dependencies induced by language. It introduces Motion Manifold Flow Primitives (MMFP), which decouples manifold learning from conditional density modeling and uses flow matching in the latent space of a learned trajectory manifold to capture difficult dependencies. The method encodes text as a task parameter via a text encoder and a learned embedding, then evolves latent codes with a conditioned velocity field $v_s(z,c)$ and decodes them into high-dimensional trajectories, achieving superior performance on SE(3) pouring and 7-DoF waving tasks with few demonstrations. Empirical results show MMFP outperforms diffusion-based and previous manifold-based approaches, yielding accurate, diverse, and smooth global trajectories suitable for real robots; future work could extend to visual inputs and continuous-time representations.
Abstract
Effective movement primitives should be capable of encoding and generating a rich repertoire of trajectories -- typically collected from human demonstrations -- conditioned on task-defining parameters such as vision or language inputs. While recent methods based on the motion manifold hypothesis, which assumes that a set of trajectories lies on a lower-dimensional nonlinear subspace, address challenges such as limited dataset size and the high dimensionality of trajectory data, they often struggle to capture complex task-motion dependencies, i.e., when motion distributions shift drastically with task variations. To address this, we introduce Motion Manifold Flow Primitives (MMFP), a framework that decouples the training of the motion manifold from task-conditioned distributions. Specifically, we employ flow matching models, state-of-the-art conditional deep generative models, to learn task-conditioned distributions in the latent coordinate space of the learned motion manifold. Experiments are conducted on language-guided trajectory generation tasks, where many-to-many text-motion correspondences introduce complex task-motion dependencies, highlighting MMFP's superiority over existing methods.
