Table of Contents
Fetching ...

Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges

Kemal Oksuz, Alexandru Buburuzan, Anthony Knittel, Yuhan Yao, Puneet K. Dokania

TL;DR

This review tackles the challenge of applying foundation models to trajectory planning in autonomous driving by introducing a hierarchical taxonomy that separates FM-tailored approaches from FM-guided ones. It systematically analyzes 37 methods, examines data and code openness, and provides practical guidelines for data curation, model design, and fine-tuning to help practitioners tailor FMs for driving tasks. The work highlights both the promise and the practical constraints of FM-based trajectory planning, including inference costs, robustness, and the sim-to-real gap, while calling for standardized benchmarks to evaluate reasoning and planning capabilities. Overall, it offers a structured, actionable framework to advance FM-enabled trajectory planning and identifies critical open issues for future research.

Abstract

The emergence of multi-modal foundation models has markedly transformed the technology for autonomous driving, shifting away from conventional and mostly hand-crafted design choices towards unified, foundation-model-based approaches, capable of directly inferring motion trajectories from raw sensory inputs. This new class of methods can also incorporate natural language as an additional modality, with Vision-Language-Action (VLA) models serving as a representative example. In this review, we provide a comprehensive examination of such methods through a unifying taxonomy to critically evaluate their architectural design choices, methodological strengths, and their inherent capabilities and limitations. Our survey covers 37 recently proposed approaches that span the landscape of trajectory planning with foundation models. Furthermore, we assess these approaches with respect to the openness of their source code and datasets, offering valuable information to practitioners and researchers. We provide an accompanying webpage that catalogs the methods based on our taxonomy, available at: https://github.com/fiveai/FMs-for-driving-trajectories

Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges

TL;DR

This review tackles the challenge of applying foundation models to trajectory planning in autonomous driving by introducing a hierarchical taxonomy that separates FM-tailored approaches from FM-guided ones. It systematically analyzes 37 methods, examines data and code openness, and provides practical guidelines for data curation, model design, and fine-tuning to help practitioners tailor FMs for driving tasks. The work highlights both the promise and the practical constraints of FM-based trajectory planning, including inference costs, robustness, and the sim-to-real gap, while calling for standardized benchmarks to evaluate reasoning and planning capabilities. Overall, it offers a structured, actionable framework to advance FM-enabled trajectory planning and identifies critical open issues for future research.

Abstract

The emergence of multi-modal foundation models has markedly transformed the technology for autonomous driving, shifting away from conventional and mostly hand-crafted design choices towards unified, foundation-model-based approaches, capable of directly inferring motion trajectories from raw sensory inputs. This new class of methods can also incorporate natural language as an additional modality, with Vision-Language-Action (VLA) models serving as a representative example. In this review, we provide a comprehensive examination of such methods through a unifying taxonomy to critically evaluate their architectural design choices, methodological strengths, and their inherent capabilities and limitations. Our survey covers 37 recently proposed approaches that span the landscape of trajectory planning with foundation models. Furthermore, we assess these approaches with respect to the openness of their source code and datasets, offering valuable information to practitioners and researchers. We provide an accompanying webpage that catalogs the methods based on our taxonomy, available at: https://github.com/fiveai/FMs-for-driving-trajectories

Paper Structure

This paper contains 23 sections, 2 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: GPT-4o's response to driving-related prompts on three different scenarios.
  • Figure 2: Different ways of how are helping trajectory planning. While (a-d) set examples for tailored for , (e) is an example for an guiding trajectory planning. The input image is from nuScenes nuscenes, the question is from DriveLM-nuScenes drivelm, the user instruction is from the SimLingo dataset SimLingo.
  • Figure 3: Trajectory planning methods. (a)Modular approaches use explicit interfaces between different modules. (b)End-to-End (-) models replace explicit interfaces with latent ones, allowing all modules to be jointly differentiable. (c)-based methods that follow the typical pipeline. The text output of the can also be used, e.g., reasoning. In this illustration, we simplify the pipelines to provide a high-level overview of how these models work, yet they can include different number of components and more complex connections across the modules. Input images are taken from nuScenes nuscenes.
  • Figure 4: Taxonomy of trajectory planning methods utilising or getting help from .
  • Figure 5: Formulations of subcategories in our taxonomy. (a--f) tailored for trajectory planning. Specifically, (a--c) are focused solely on trajectory planning. These methods either do not have a text prompt $\mathbf{T}$ or have a fixed one, hence $\mathbf{T}$ is optional. (d--f) are the subcategories providing additional capabilities. $\mathbf{O_{traj}}$ is shown without for clarity, and $^*$ highlights that $\mathbf{O_{traj}}$ can also be obtained using different forms of . (g)guiding trajectory planning via knowledge distillation.
  • ...and 6 more figures