Table of Contents
Fetching ...

Unified Generation-Refinement Planning: Bridging Guided Flow Matching and Sampling-Based MPC for Social Navigation

Kazuki Mizuta, Karen Leung

TL;DR

The paper tackles the problem of safe, real-time robot planning in dynamic human environments by bridging learning-based trajectory generation with optimization-based constraint enforcement. It introduces a reward-guided conditional flow matching (CFM) model to produce multimodal trajectory priors and couples it with model predictive path integral (MPPI) control, forming a bidirectional loop where CFM priors guide MPPI refinement and MPPI solutions warm-start subsequent CFM generation. Key contributions include the integration of CFM with MPPI, reward-guided, retraining-free adaptation of CFM, mode-selective MPPI to preserve multimodal options, and empirical validation in social navigation showing improved safety, goal attainment, and real-time performance. The approach offers a practical path to robust social navigation by combining expressive generative priors with explicit constraint handling, enabling real-time deployment in crowded environments.

Abstract

Planning safe and effective robot behavior in dynamic, human-centric environments remains a core challenge due to the need to handle multimodal uncertainty, adapt in real-time, and ensure safety. Optimization-based planners offer explicit constraint handling but performance relies on initialization quality. Learning-based planners better capture multimodal possible solutions but struggle to enforce constraints such as safety. In this paper, we introduce a unified generation-refinement framework bridging learning and optimization with a novel reward-guided conditional flow matching (CFM) model and model predictive path integral (MPPI) control. Our key innovation is in the incorporation of a bidirectional information exchange: samples from a reward-guided CFM model provide informed priors for MPPI refinement, while the optimal trajectory from MPPI warm-starts the next CFM generation. Using autonomous social navigation as a motivating application, we demonstrate that our approach can flexibly adapt to dynamic environments to satisfy safety requirements in real-time.

Unified Generation-Refinement Planning: Bridging Guided Flow Matching and Sampling-Based MPC for Social Navigation

TL;DR

The paper tackles the problem of safe, real-time robot planning in dynamic human environments by bridging learning-based trajectory generation with optimization-based constraint enforcement. It introduces a reward-guided conditional flow matching (CFM) model to produce multimodal trajectory priors and couples it with model predictive path integral (MPPI) control, forming a bidirectional loop where CFM priors guide MPPI refinement and MPPI solutions warm-start subsequent CFM generation. Key contributions include the integration of CFM with MPPI, reward-guided, retraining-free adaptation of CFM, mode-selective MPPI to preserve multimodal options, and empirical validation in social navigation showing improved safety, goal attainment, and real-time performance. The approach offers a practical path to robust social navigation by combining expressive generative priors with explicit constraint handling, enabling real-time deployment in crowded environments.

Abstract

Planning safe and effective robot behavior in dynamic, human-centric environments remains a core challenge due to the need to handle multimodal uncertainty, adapt in real-time, and ensure safety. Optimization-based planners offer explicit constraint handling but performance relies on initialization quality. Learning-based planners better capture multimodal possible solutions but struggle to enforce constraints such as safety. In this paper, we introduce a unified generation-refinement framework bridging learning and optimization with a novel reward-guided conditional flow matching (CFM) model and model predictive path integral (MPPI) control. Our key innovation is in the incorporation of a bidirectional information exchange: samples from a reward-guided CFM model provide informed priors for MPPI refinement, while the optimal trajectory from MPPI warm-starts the next CFM generation. Using autonomous social navigation as a motivating application, we demonstrate that our approach can flexibly adapt to dynamic environments to satisfy safety requirements in real-time.

Paper Structure

This paper contains 17 sections, 1 theorem, 11 equations, 14 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Given a base velocity field $v_\theta(\mathbf{z}_\tau,\mathbf{c},\tau)$ and a differentiable reward function $R(\mathbf{z}_1)$, the guided velocity field $\tilde{v}_\theta(\mathbf{z}_\tau,\mathbf{c},\tau)$ is given by where $\lambda_\text{guide}$ is a hyperparameter controlling guidance strength.

Figures (14)

  • Figure 1: Overview of the proposed unified planning framework for dynamic environments: At each planning step, our conditional flow matching (CFM) model generates context-aware and multimodal trajectory candidates guided by a reward function. Promising candidates are selected, then refined, and the best trajectory is selected and executed. The optimal trajectory warm-starts the next CFM generation for the next planning step.
  • Figure 2: A safety-guided conditional flow matching (CFM) model generates diverse trajectories as priors for model predictive control (MPC), which in turn warm-starts the next CFM sampling step.
  • Figure 3: Gaussian.
  • Figure 4: Diffusion.
  • Figure 5: CFM.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Proposition 1: Reward-guided CFM Velocity Field
  • proof