Table of Contents
Fetching ...

Crowd-FM: Learned Optimal Selection of Conditional Flow Matching-generated Trajectories for Crowd Navigation

Antareep Singha, Laksh Nanwani, Mathai Mathew P., Samkit Jain, Phani Teja Singamaneni, Arun Kumar Singh, K. Madhava Krishna

TL;DR

Crowd-FM tackles the problem of safe and human-like local planning in dense crowds by learning a distribution of collision-free trajectories conditioned on sensor context. It combines Conditional Flow Matching to produce diverse trajectory primitives and a learned scoring function to select human-like options, with inference-time cost guidance and a projection-based optimizer for refinement. The approach yields higher success rates than strong baselines with CFM alone and outperforms expensive planners with refinement, while the scoring function reduces human-likeness error compared with hand-crafted costs. Real-world experiments on a Husky and an autonomous wheelchair demonstrate real-time feasibility on resource-constrained platforms.

Abstract

Safe and computationally efficient local planning for mobile robots in dense, unstructured human crowds remains a fundamental challenge. Moreover, ensuring that robot trajectories are similar to how a human moves will increase the acceptance of the robot in human environments. In this paper, we present Crowd-FM, a learning-based approach to address both safety and human-likeness challenges. Our approach has two novel components. First, we train a Conditional Flow-Matching (CFM) policy over a dataset of optimally controlled trajectories to learn a set of collision-free primitives that a robot can choose at any given scenario. The chosen optimal control solver can generate multi-modal collision-free trajectories, allowing the CFM policy to learn a diverse set of maneuvers. Secondly, we learn a score function over a dataset of human demonstration trajectories that provides a human-likeness score for the flow primitives. At inference time, computing the optimal trajectory requires selecting the one with the highest score. Our approach improves the state-of-the-art by showing that our CFM policy alone can produce collision-free navigation with a higher success rate than existing learning-based baselines. Furthermore, when augmented with inference-time refinement, our approach can outperform even expensive optimisation-based planning approaches. Finally, we validate that our scoring network can select trajectories closer to the expert data than a manually designed cost function.

Crowd-FM: Learned Optimal Selection of Conditional Flow Matching-generated Trajectories for Crowd Navigation

TL;DR

Crowd-FM tackles the problem of safe and human-like local planning in dense crowds by learning a distribution of collision-free trajectories conditioned on sensor context. It combines Conditional Flow Matching to produce diverse trajectory primitives and a learned scoring function to select human-like options, with inference-time cost guidance and a projection-based optimizer for refinement. The approach yields higher success rates than strong baselines with CFM alone and outperforms expensive planners with refinement, while the scoring function reduces human-likeness error compared with hand-crafted costs. Real-world experiments on a Husky and an autonomous wheelchair demonstrate real-time feasibility on resource-constrained platforms.

Abstract

Safe and computationally efficient local planning for mobile robots in dense, unstructured human crowds remains a fundamental challenge. Moreover, ensuring that robot trajectories are similar to how a human moves will increase the acceptance of the robot in human environments. In this paper, we present Crowd-FM, a learning-based approach to address both safety and human-likeness challenges. Our approach has two novel components. First, we train a Conditional Flow-Matching (CFM) policy over a dataset of optimally controlled trajectories to learn a set of collision-free primitives that a robot can choose at any given scenario. The chosen optimal control solver can generate multi-modal collision-free trajectories, allowing the CFM policy to learn a diverse set of maneuvers. Secondly, we learn a score function over a dataset of human demonstration trajectories that provides a human-likeness score for the flow primitives. At inference time, computing the optimal trajectory requires selecting the one with the highest score. Our approach improves the state-of-the-art by showing that our CFM policy alone can produce collision-free navigation with a higher success rate than existing learning-based baselines. Furthermore, when augmented with inference-time refinement, our approach can outperform even expensive optimisation-based planning approaches. Finally, we validate that our scoring network can select trajectories closer to the expert data than a manually designed cost function.
Paper Structure (23 sections, 13 equations, 6 figures, 5 tables)

This paper contains 23 sections, 13 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Crowd-FM is a long-horizon local planner that is capable of rapidly generating collision-free trajectories in a batch. The proposed method takes in as input, 2D point cloud data, dynamic obstacle positions and velocities, and the heading to goal angle. A Conditional Flow Matching model is trained to generate such trajectories conditioned on the input sensor data. Finally, the trajectories are refined using an Optimizerrastgar2023priestprojectionguidedsamplingbased to meet kinodynamic constraints. A separate Scoring Function is trained on Human Expert Trajectories, to enable selection of human-like trajectories from the ones generated by CFM.
  • Figure 2: The Conditional Flow Matching model is used to learn a multi-modal distribution of collision-free trajectories in terms of Bernstein coefficients to ensure that the reconstructed trajectories have higher continuity and differentiability. The model is conditioned on a Transformer-based input space encoder that takes into account the environmental context at every timestep. The inference-time integration is implicitly guided by a collision cost term encouraging collision-free generations. Finally, the generated trajectories are refined using a single optimization steprastgar2023priestprojectionguidedsamplingbased to satisfy kinodynamic constraints.
  • Figure 3: The learned scoring function has a similar input space representation as the Flow model. It borrows the input encoders from the Flow model architecture, namely the Point Cloud, Dynamic Obstacles, and the Goal Encoders. Additionally, the Flow-generated trajectories are encoded and tokenized as the input to a Transformer Encoder. The transformer output is passed to the Scoring Head to output raw scores for each trajectory generated.
  • Figure 5: Effect of Inference-time Cost Guidance: Left image shows the trajectories generated by Crowd-FM with Cost Guidance. Right image shows the trajectories generated by Crowd-FM without Cost Guidance. Trajectories generated using collision cost guidance are found to be more controlled near obstacles than the trajectories without it. (Tests done without Optimizer Refinement)
  • Figure 6: Grouped Bar Plot showing the HLPs for Scoring Function selected Trajectories(BLUE) and Cost-Function selected Trajectories(ORANGE).
  • ...and 1 more figures