Table of Contents
Fetching ...

Generative Predictive Control: Flow Matching Policies for Dynamic and Difficult-to-Demonstrate Tasks

Vince Kurtz, Joel W. Burdick

TL;DR

This work tackles the challenge of controlling fast, nonlinear robotic dynamics without relying on extensive expert demonstrations. It introduces Generative Predictive Control (GPC), which learns a flow-matching policy to emulate the sampling-based predictive control (SPC) target distribution, using cycles of SPC data collection to train the policy and improve subsequent SPC samples. The approach yields high-frequency, temporally-consistent control and demonstrates robustness through risk-aware domain randomization, while exposing scalability limits on very large systems like humanoid standup. Collectively, GPC provides a principled, supervised-learning-based path toward generalist, fast-reacting policies that leverage both generative modeling and predictive control concepts.

Abstract

Generative control policies have recently unlocked major progress in robotics. These methods produce action sequences via diffusion or flow matching, with training data provided by demonstrations. But existing methods come with two key limitations: they require expert demonstrations, which can be difficult to obtain, and they are limited to relatively slow, quasi-static tasks. In this paper, we leverage a tight connection between sampling-based predictive control and generative modeling to address each of these issues. In particular, we introduce generative predictive control, a supervised learning framework for tasks with fast dynamics that are easy to simulate but difficult to demonstrate. We then show how trained flow-matching policies can be warm-started at inference time, maintaining temporal consistency and enabling high-frequency feedback. We believe that generative predictive control offers a complementary approach to existing behavior cloning methods, and hope that it paves the way toward generalist policies that extend beyond quasi-static demonstration-oriented tasks.

Generative Predictive Control: Flow Matching Policies for Dynamic and Difficult-to-Demonstrate Tasks

TL;DR

This work tackles the challenge of controlling fast, nonlinear robotic dynamics without relying on extensive expert demonstrations. It introduces Generative Predictive Control (GPC), which learns a flow-matching policy to emulate the sampling-based predictive control (SPC) target distribution, using cycles of SPC data collection to train the policy and improve subsequent SPC samples. The approach yields high-frequency, temporally-consistent control and demonstrates robustness through risk-aware domain randomization, while exposing scalability limits on very large systems like humanoid standup. Collectively, GPC provides a principled, supervised-learning-based path toward generalist, fast-reacting policies that leverage both generative modeling and predictive control concepts.

Abstract

Generative control policies have recently unlocked major progress in robotics. These methods produce action sequences via diffusion or flow matching, with training data provided by demonstrations. But existing methods come with two key limitations: they require expert demonstrations, which can be difficult to obtain, and they are limited to relatively slow, quasi-static tasks. In this paper, we leverage a tight connection between sampling-based predictive control and generative modeling to address each of these issues. In particular, we introduce generative predictive control, a supervised learning framework for tasks with fast dynamics that are easy to simulate but difficult to demonstrate. We then show how trained flow-matching policies can be warm-started at inference time, maintaining temporal consistency and enabling high-frequency feedback. We believe that generative predictive control offers a complementary approach to existing behavior cloning methods, and hope that it paves the way toward generalist policies that extend beyond quasi-static demonstration-oriented tasks.

Paper Structure

This paper contains 19 sections, 1 theorem, 30 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

The score of the noised target distribution eq:noised_target is given by

Figures (5)

  • Figure 1: Generative predictive control is a supervised learning framework for dynamic tasks that are difficult to demonstrate but easy to simulate. First, we generate training data with sampling-based predictive control li2024dropwilliams2016aggressivehowell2022predictive, leveraging advances in massively parallel GPU simulation mjxGenesismakoviychuk2021isaac. We then use this data to train a flow matching policy, which in turn provides additional high-quality samples. This results in better training data for subsequent iterations, in a virtuous cycle.
  • Figure 2: Systems used to evaluate GPC performance in simulation, from left to right: inverted pendulum, cart-pole, double cart-pole, push-T, planar walker, luffing crane, humanoid standup.
  • Figure 3: Closed-loop double cart-pole performance with and without warm-starts. The warm-started policy (right) produces smooth actions and is able to successfully balance. Without warm-starts (left), actions jitter between modes, and the robot fails to balance.
  • Figure 4: Average cost per time step for SPC, GPC (policy alone), and GPC+ (policy + sampling). Black bars indicate standard deviation over 100 ten-second simulations from randomized initial conditions. Applying the GPC policy directly provides performance on-par-with or better-than SPC in all cases except humanoid standup. GPC+ meets or exceeds the performance of the other methods across all examples.
  • Figure 5: Training curves showing the average cost $J$, percent of states in which the flow-matching policy generated the best action sequence, and the loss $\mathcal{L}_{GPC}$ from three random seeds. GPC is able to leverage the training stability of supervised learning while avoiding the need for demonstrations.

Theorems & Definitions (3)

  • Proposition 1
  • Remark 1
  • proof