Table of Contents
Fetching ...

GenPlanner: From Noise to Plans -- Emergent Reasoning in Flow Matching and Diffusion Models

Agnieszka Polowczyk, Alicja Polowczyk, Michał Wieczorek

TL;DR

GenPlanner is proposed, an approach based on diffusion models and flow matching, along with two variants: DiffPlanner and FlowPlanner, which significantly outperforms the baseline CNN model and demonstrates high performance even with a limited number of generation steps.

Abstract

Path planning in complex environments is one of the key problems of artificial intelligence because it requires simultaneous understanding of the geometry of space and the global structure of the problem. In this paper, we explore the potential of using generative models as planning and reasoning mechanisms. We propose GenPlanner, an approach based on diffusion models and flow matching, along with two variants: DiffPlanner and FlowPlanner. We demonstrate the application of generative models to find and generate correct paths in mazes. A multi-channel condition describing the structure of the environment, including an obstacle map and information about the starting and destination points, is used to condition trajectory generation. Unlike standard methods, our models generate trajectories iteratively, starting with random noise and gradually transforming it into a correct solution. Experiments conducted show that the proposed approach significantly outperforms the baseline CNN model. In particular, FlowPlanner demonstrates high performance even with a limited number of generation steps.

GenPlanner: From Noise to Plans -- Emergent Reasoning in Flow Matching and Diffusion Models

TL;DR

GenPlanner is proposed, an approach based on diffusion models and flow matching, along with two variants: DiffPlanner and FlowPlanner, which significantly outperforms the baseline CNN model and demonstrates high performance even with a limited number of generation steps.

Abstract

Path planning in complex environments is one of the key problems of artificial intelligence because it requires simultaneous understanding of the geometry of space and the global structure of the problem. In this paper, we explore the potential of using generative models as planning and reasoning mechanisms. We propose GenPlanner, an approach based on diffusion models and flow matching, along with two variants: DiffPlanner and FlowPlanner. We demonstrate the application of generative models to find and generate correct paths in mazes. A multi-channel condition describing the structure of the environment, including an obstacle map and information about the starting and destination points, is used to condition trajectory generation. Unlike standard methods, our models generate trajectories iteratively, starting with random noise and gradually transforming it into a correct solution. Experiments conducted show that the proposed approach significantly outperforms the baseline CNN model. In particular, FlowPlanner demonstrates high performance even with a limited number of generation steps.
Paper Structure (12 sections, 9 equations, 7 figures, 4 tables)

This paper contains 12 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Qualitative comparison of ground-truth paths and FlowPlanner generations on grid mazes of different sizes ($8\times8$, $16 \times16$, $32\times32$, $48\times48$). FlowPlanner generates trajectories consistent with reference paths, maintaining correctness and consistency even for larger mesh sizes.
  • Figure 2: Failure case of logical and spatial reasoning in a vision–language model. Given a maze image with start and goal locations, the VLM (Qwen) generates a plausible sequence of directional moves, yet the resulting path is invalid. This example highlights the difficulty of current VLMs in performing precise logical reasoning and grid-based path planning.
  • Figure 3: Overview of FlowPlanner training and inference. FlowPlanner treats path planning as a denoising process. A U-Net conditioned on walls and start-goal locations predicts velocity at each timestep, enabling iterative refinement from random noise to correct the route, where the pixel value > 0 is a path. DiffPlanner has a similar scheme where instead of operating on the velocity vector, the U-Net predicts noise.
  • Figure 4: Intermediate $\hat{{x}}_{0,t}$ Estimates for Diffusion (top) and Flow Models (bottom). For DiffPlanner, the path structure is formed gradually with subsequent denoising steps, while FlowPlanner produces coherent trajectory structures already at early integration steps, as the predicted velocity field directly guides the sample toward the final solution.
  • Figure 5: Qualitative comparison of paths generated by CNN, DiffPlanner, and FlowPlanner on $48\times48$ mazes.
  • ...and 2 more figures