Table of Contents
Fetching ...

DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving

Dingrui Wang, Marc Kaufeld, Johannes Betz

TL;DR

Corner-case reasoning in autonomous driving is addressed by DualAD, a dual-layer architecture that couples a rule-based bottom planner with an upper-layer LLM that reasons on a text-encoded description of the scenario. A rule-based text encoder converts surrounding agents and ego-vehicle states into descriptive text, enabling the LLM to intervene and adjust speed in dangerous situations without replacing the core controllers. The paper introduces the text-encoder design, demonstrates substantial gains in reactive planning (R-CLS) and competitive gains in non-reactive planning (NR-CLS) across hard benchmarks, and shows that performance improves as LLM strength increases. This framework offers a practical, scalable path to imbue existing planners with human-like reasoning while maintaining efficiency and safety in autonomous driving.

Abstract

We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving. DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder that converts driving scenarios from absolute states into text description. This text is then processed by a large language model (LLM) to make driving decisions. The upper layer intervenes in the bottom layer's decisions when potential danger is detected, mimicking human reasoning in critical situations. Closed-loop experiments demonstrate that DualAD, using a zero-shot pre-trained model, significantly outperforms rule-based motion planners that lack reasoning abilities. Our experiments also highlight the effectiveness of the text encoder, which considerably enhances the model's scenario understanding. Additionally, the integrated DualAD model improves with stronger LLMs, indicating the framework's potential for further enhancement. Code and benchmarks are available at github.com/TUM-AVS/DualAD.

DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving

TL;DR

Corner-case reasoning in autonomous driving is addressed by DualAD, a dual-layer architecture that couples a rule-based bottom planner with an upper-layer LLM that reasons on a text-encoded description of the scenario. A rule-based text encoder converts surrounding agents and ego-vehicle states into descriptive text, enabling the LLM to intervene and adjust speed in dangerous situations without replacing the core controllers. The paper introduces the text-encoder design, demonstrates substantial gains in reactive planning (R-CLS) and competitive gains in non-reactive planning (NR-CLS) across hard benchmarks, and shows that performance improves as LLM strength increases. This framework offers a practical, scalable path to imbue existing planners with human-like reasoning while maintaining efficiency and safety in autonomous driving.

Abstract

We present a novel autonomous driving framework, DualAD, designed to imitate human reasoning during driving. DualAD comprises two layers: a rule-based motion planner at the bottom layer that handles routine driving tasks requiring minimal reasoning, and an upper layer featuring a rule-based text encoder that converts driving scenarios from absolute states into text description. This text is then processed by a large language model (LLM) to make driving decisions. The upper layer intervenes in the bottom layer's decisions when potential danger is detected, mimicking human reasoning in critical situations. Closed-loop experiments demonstrate that DualAD, using a zero-shot pre-trained model, significantly outperforms rule-based motion planners that lack reasoning abilities. Our experiments also highlight the effectiveness of the text encoder, which considerably enhances the model's scenario understanding. Additionally, the integrated DualAD model improves with stronger LLMs, indicating the framework's potential for further enhancement. Code and benchmarks are available at github.com/TUM-AVS/DualAD.
Paper Structure (16 sections, 14 equations, 5 figures, 3 tables)

This paper contains 16 sections, 14 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: DualAD is a dual-layer autonomous driving framework that imitates human cognitive processes during driving. The lower layer is responsible for reference path and motion planning, while the upper layer is the the reasoning module which dynamically checks the surrounding potential danger and adjusts speed limits or even applies hard braking in critical scenarios.
  • Figure 2: Illustration of the reasoning processes comparing image and text modalities. The reasoning result of an image and the reasoning result of different types of the image related text description should be equivalent if the Platonic Representation Hypothesis huh2024prh holds.
  • Figure 3: The process of converting a driving scenario into a text description. The agents in the scenario are first transformed from Cartesian coordinates (in local frame) to Frenet coordinates using the reference path of the Motion Planner. Then, all agents are described in this local view based on their states through a rule-based system.
  • Figure 4: An example of the text description of states for a vehicle.
  • Figure 5: Performance comparison between DualAD and Lattice-IDM on an example simulated with reactive environment.