Table of Contents
Fetching ...

Can LLMs plan paths in the real world?

Wanyi Chen, Meng-Wen Su, Nafisa Mehjabin, Mary L. Cummings

TL;DR

This paper evaluates whether current LLMs can reliably plan real-world vehicle paths by testing GPT-4, Gemini, and Mistral across six TbT and VLN scenarios. It finds pervasive errors across all models, with major failures including route discontinuities, incorrect directions, and failure to reach destinations, indicating that LLMs are not ready for real-world navigation tasks. The authors argue for mechanisms enabling reality checks, improved in-context transparency, and a shift toward smaller, more specialized models to mitigate risks and improve reliability. The work underscores the gap between promising capabilities in controlled settings and the precision required for safe, real-world navigation, with implications for automotive deployment strategies and model design.

Abstract

As large language models (LLMs) increasingly integrate into vehicle navigation systems, understanding their path-planning capability is crucial. We tested three LLMs through six real-world path-planning scenarios in various settings and with various difficulties. Our experiments showed that all LLMs made numerous errors in all scenarios, revealing that they are unreliable path planners. We suggest that future work focus on implementing mechanisms for reality checks, enhancing model transparency, and developing smaller models.

Can LLMs plan paths in the real world?

TL;DR

This paper evaluates whether current LLMs can reliably plan real-world vehicle paths by testing GPT-4, Gemini, and Mistral across six TbT and VLN scenarios. It finds pervasive errors across all models, with major failures including route discontinuities, incorrect directions, and failure to reach destinations, indicating that LLMs are not ready for real-world navigation tasks. The authors argue for mechanisms enabling reality checks, improved in-context transparency, and a shift toward smaller, more specialized models to mitigate risks and improve reliability. The work underscores the gap between promising capabilities in controlled settings and the precision required for safe, real-world navigation, with implications for automotive deployment strategies and model design.

Abstract

As large language models (LLMs) increasingly integrate into vehicle navigation systems, understanding their path-planning capability is crucial. We tested three LLMs through six real-world path-planning scenarios in various settings and with various difficulties. Our experiments showed that all LLMs made numerous errors in all scenarios, revealing that they are unreliable path planners. We suggest that future work focus on implementing mechanisms for reality checks, enhancing model transparency, and developing smaller models.

Paper Structure

This paper contains 19 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Visualization of paths for the suburban TbT scenario. LLM-generated paths are marked in red. Discontinuities are marked in green.
  • Figure 2: Visualization of paths for the medium VLN scenario. LLM-generated paths are marked in red. Discontinuities are marked in green.
  • Figure 3: Urban TbT Table (Total distance Waze: 21 miles, Waze turn #: 12)
  • Figure 4: Suburban TbT Table (Total distance Waze: 30.2 miles, Waze turn #: 18)
  • Figure 5: Rural TbT Table (Total distance Waze: 291.4 miles, Waze turn #: 14)
  • ...and 3 more figures