Table of Contents
Fetching ...

Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning

Hourui Deng, Hongjie Zhang, Jie Ou, Chaosheng Feng

TL;DR

The paper tackles the challenge of spatial reasoning and long-term path planning in large language models by introducing S2RCQL, which combines Spatial-to-Relational Transformation with Curriculum Q-Learning. The Spatial-to-Relational step converts maze coordinates into an explicit entity-relations graph, enabling the LLM to reason over connectivity rather than raw positions, while Curriculum Q-Learning uses a reverse curriculum and LLM-guided action selection to reduce context inconsistency and dead-end exploration. Experiments with ERNIE-Bot 4.0 show substantial improvements (roughly 23–40% in success and 23–30% in optimality) over advanced prompt baselines, with ablations demonstrating the importance of both the relational representation and curriculum learning. The approach offers a practical path to more reliable LLM-based path planning in embodied AI settings and suggests directions for improving reverse curriculum generation and scalability to larger, more complex environments.

Abstract

Spatial reasoning in Large Language Models (LLMs) is the foundation for embodied intelligence. However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning. To address this challenge, this study proposes an innovative model, Spatial-to-Relational Transformation and Curriculum Q-Learning (S2RCQL). To address the spatial hallucination of LLMs, we propose the Spatial-to-Relational approach, which transforms spatial prompts into entity relations and paths representing entity relation chains. This approach fully taps the potential of LLMs in terms of sequential thinking. As a result, we design a path-planning algorithm based on Q-learning to mitigate the context inconsistency hallucination, which enhances the reasoning ability of LLMs. Using the Q-value of state-action as auxiliary information for prompts, we correct the hallucinations of LLMs, thereby guiding LLMs to learn the optimal path. Finally, we propose a reverse curriculum learning technique based on LLMs to further mitigate the context inconsistency hallucination. LLMs can rapidly accumulate successful experiences by reducing task difficulty and leveraging them to tackle more complex tasks. We performed comprehensive experiments based on Baidu's self-developed LLM: ERNIE-Bot 4.0. The results showed that our S2RCQL achieved a 23%--40% improvement in both success and optimality rates compared with advanced prompt engineering.

Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning

TL;DR

The paper tackles the challenge of spatial reasoning and long-term path planning in large language models by introducing S2RCQL, which combines Spatial-to-Relational Transformation with Curriculum Q-Learning. The Spatial-to-Relational step converts maze coordinates into an explicit entity-relations graph, enabling the LLM to reason over connectivity rather than raw positions, while Curriculum Q-Learning uses a reverse curriculum and LLM-guided action selection to reduce context inconsistency and dead-end exploration. Experiments with ERNIE-Bot 4.0 show substantial improvements (roughly 23–40% in success and 23–30% in optimality) over advanced prompt baselines, with ablations demonstrating the importance of both the relational representation and curriculum learning. The approach offers a practical path to more reliable LLM-based path planning in embodied AI settings and suggests directions for improving reverse curriculum generation and scalability to larger, more complex environments.

Abstract

Spatial reasoning in Large Language Models (LLMs) is the foundation for embodied intelligence. However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning. To address this challenge, this study proposes an innovative model, Spatial-to-Relational Transformation and Curriculum Q-Learning (S2RCQL). To address the spatial hallucination of LLMs, we propose the Spatial-to-Relational approach, which transforms spatial prompts into entity relations and paths representing entity relation chains. This approach fully taps the potential of LLMs in terms of sequential thinking. As a result, we design a path-planning algorithm based on Q-learning to mitigate the context inconsistency hallucination, which enhances the reasoning ability of LLMs. Using the Q-value of state-action as auxiliary information for prompts, we correct the hallucinations of LLMs, thereby guiding LLMs to learn the optimal path. Finally, we propose a reverse curriculum learning technique based on LLMs to further mitigate the context inconsistency hallucination. LLMs can rapidly accumulate successful experiences by reducing task difficulty and leveraging them to tackle more complex tasks. We performed comprehensive experiments based on Baidu's self-developed LLM: ERNIE-Bot 4.0. The results showed that our S2RCQL achieved a 23%--40% improvement in both success and optimality rates compared with advanced prompt engineering.
Paper Structure (14 sections, 1 equation, 9 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 1 equation, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: An example of a maze. Solving this maze path planning task is challenging using both CoT and Rememberer, an LLMs with RL method.
  • Figure 2: This diagram provides an overview of our approach. First, we convert arbitrary text maze descriptions into entity relations using LLMs and Python code. Then, we combined the Q-learning and LLMs to select actions through $\epsilon-greedy$ with reverse curriculum learning.
  • Figure 3: This module can process any maze map description and convert it into a relational network.
  • Figure 4: Generate curriculums by LLMs.
  • Figure 5: Comparison of the performance of S2RCQL and Rememberer.
  • ...and 4 more figures