Table of Contents
Fetching ...

KnowDiffuser: A Knowledge-Guided Diffusion Planner with LM Reasoning and Prior-Informed Trajectory Initialization

Fan Ding, Xuewen Luo, Fengze Yang, Bo Yu, HwaHui Tew, Ganesh Krishnasamy, Junn Yong Loo

TL;DR

KnowDiffuser is proposed, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.

Abstract

Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and lack the ability to generate continuous, physically feasible trajectories required for motion planning. Meanwhile, diffusion models have proven effective at generating reliable and dynamically consistent trajectories, but often lack semantic interpretability and alignment with scene-level understanding. To address these limitations, we propose \textbf{KnowDiffuser}, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models. The framework employs a language model to infer context-aware meta-actions from structured scene representations, which are then mapped to prior trajectories that anchor the subsequent denoising process. A two-stage truncated denoising mechanism refines these trajectories efficiently, preserving both semantic alignment and physical feasibility. Experiments on the nuPlan benchmark demonstrate that KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.

KnowDiffuser: A Knowledge-Guided Diffusion Planner with LM Reasoning and Prior-Informed Trajectory Initialization

TL;DR

KnowDiffuser is proposed, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.

Abstract

Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and lack the ability to generate continuous, physically feasible trajectories required for motion planning. Meanwhile, diffusion models have proven effective at generating reliable and dynamically consistent trajectories, but often lack semantic interpretability and alignment with scene-level understanding. To address these limitations, we propose \textbf{KnowDiffuser}, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models. The framework employs a language model to infer context-aware meta-actions from structured scene representations, which are then mapped to prior trajectories that anchor the subsequent denoising process. A two-stage truncated denoising mechanism refines these trajectories efficiently, preserving both semantic alignment and physical feasibility. Experiments on the nuPlan benchmark demonstrate that KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.
Paper Structure (17 sections, 15 equations, 1 figure, 3 tables, 2 algorithms)

This paper contains 17 sections, 15 equations, 1 figure, 3 tables, 2 algorithms.

Figures (1)

  • Figure 1: Comparative Overview of Planning Frameworks. (a) LM-based Planner: Predicts future ego trajectories via semantic reasoning, offering interpretability but lacking in low-level control precision. (b) Diffusion Trajectory Generator: Generates physically plausible trajectories through iterative denoising but lack of semantic understanding. (c) KnowDiffuser: Combines LM-driven high-level meta-action planning with prior-trajectory diffusion two-step denoising, enabling semantically grounded yet physically feasible motion generation.