Unified Generation-Refinement Planning: Bridging Guided Flow Matching and Sampling-Based MPC for Social Navigation
Kazuki Mizuta, Karen Leung
TL;DR
The paper tackles the problem of safe, real-time robot planning in dynamic human environments by bridging learning-based trajectory generation with optimization-based constraint enforcement. It introduces a reward-guided conditional flow matching (CFM) model to produce multimodal trajectory priors and couples it with model predictive path integral (MPPI) control, forming a bidirectional loop where CFM priors guide MPPI refinement and MPPI solutions warm-start subsequent CFM generation. Key contributions include the integration of CFM with MPPI, reward-guided, retraining-free adaptation of CFM, mode-selective MPPI to preserve multimodal options, and empirical validation in social navigation showing improved safety, goal attainment, and real-time performance. The approach offers a practical path to robust social navigation by combining expressive generative priors with explicit constraint handling, enabling real-time deployment in crowded environments.
Abstract
Planning safe and effective robot behavior in dynamic, human-centric environments remains a core challenge due to the need to handle multimodal uncertainty, adapt in real-time, and ensure safety. Optimization-based planners offer explicit constraint handling but performance relies on initialization quality. Learning-based planners better capture multimodal possible solutions but struggle to enforce constraints such as safety. In this paper, we introduce a unified generation-refinement framework bridging learning and optimization with a novel reward-guided conditional flow matching (CFM) model and model predictive path integral (MPPI) control. Our key innovation is in the incorporation of a bidirectional information exchange: samples from a reward-guided CFM model provide informed priors for MPPI refinement, while the optimal trajectory from MPPI warm-starts the next CFM generation. Using autonomous social navigation as a motivating application, we demonstrate that our approach can flexibly adapt to dynamic environments to satisfy safety requirements in real-time.
