LLM A*: Human in the Loop Large Language Models Enabled A* Search for Robotics

Hengjia Xiao; Peng Wang; Mingzhe Yu; Mattia Robbiani

LLM A: Human in the Loop Large Language Models Enabled A Search for Robotics

Hengjia Xiao, Peng Wang, Mingzhe Yu, Mattia Robbiani

TL;DR

The paper addresses path planning for mobile robots by introducing LLM A*, a human-in-the-loop framework that integrates a traditional A* search with large language models. A two-level architecture combines a lower level A* subgoal planner with a self-adaptive environment value and a higher level LLM that selects subgoals based on exploration history, guided by prompts and human feedback. The approach defines a cost function that blends distance and environment awareness, leverages an LLM based initial reward, and uses a pixel-based auxiliary task to adapt environmental value, enabling transparent and interactive planning. Experimental results against A* and PPO on grid maps show reduced search complexity and near-optimal path lengths, with enhanced safety and explainability through human oversight; the work aims to broaden accessibility of AI-driven planning through code-free interaction and human guidance.

Abstract

This research focuses on how Large Language Models (LLMs) can help with (path) planning for mobile embodied agents such as robots, in a human-in-the-loop and interactive manner. A novel framework named LLM A*, aims to leverage the commonsense of LLMs, and the utility-optimal A* is proposed to facilitate few-shot near-optimal path planning. Prompts are used for two main purposes: 1) to provide LLMs with essential information like environments, costs, heuristics, etc.; 2) to communicate human feedback on intermediate planning results to LLMs. This approach takes human feedback on board and renders the entire planning process transparent (akin to a `white box') to humans. Moreover, it facilitates code-free path planning, thereby fostering the accessibility and inclusiveness of artificial intelligence techniques to communities less proficient in coding. Comparative analysis against A* and RL demonstrates that LLM A* exhibits greater efficiency in terms of search space and achieves paths comparable to A* while outperforming RL. The interactive nature of LLM A* also makes it a promising tool for deployment in collaborative human-robot tasks. Codes and Supplemental Materials can be found at GitHub: https://github.com/speedhawk/LLM-A-.

LLM A: Human in the Loop Large Language Models Enabled A Search for Robotics

TL;DR

Abstract

Paper Structure (17 sections, 10 equations, 4 figures, 1 table)

This paper contains 17 sections, 10 equations, 4 figures, 1 table.

INTRODUCTION
Related Work
LLM for Robotic Task Planning
Reinforcement Learning for Robotic Task Planning
The LLM A* Approach
A* Preliminary
Self-Adaption Auxiliary Task
LLM-Based Initial Reward
Experiments and Discussions
Setup
Evaluation Metrics
LLM A* Training and Session Design
RL Model Training
Main Results
Ablation Study
...and 2 more sections

Figures (4)

Figure 1: LLM-based path planning: (a) path planned by LLM directly; (b) path planned by the proposed LLM A*. The initial and goal states are at the upper right and the lower left corners, respectively. The white and black tiles represent free spaces and obstacles. The red tiles form the final paths and the green tiles are the total searched tiles. We can see the path planned by LLM goes through obstacles (results generated by GPT3.5-turbo), which is prohibitive in robotics.
Figure 2: Path planning results: (a), (b), (c), and (d) are results from A*, LLM A*, LLM Greedy, and PPO, respectively. The pink grids mark the initial states, while the yellow grids denote the goal states. White tiles depict free spaces, whereas black tiles represent obstacles. The green grids illustrate the search space. The final path is depicted by grids transitioning from blue to red, with color gradients aiding in visualising any back-and-forth movements within the paths.
Figure 3: Covergence of RL model training: (a) average steps per episode; (b) average scores achieved per episode.
Figure 4: Experimental results of A*, LLM A*, LLM Greedy, and PPO on the Aisle and Double Door environments with different sizes, i.e., $16\times16$ and $32\times32$.

LLM A: Human in the Loop Large Language Models Enabled A Search for Robotics

TL;DR

Abstract

LLM A: Human in the Loop Large Language Models Enabled A Search for Robotics

Authors

TL;DR

Abstract

Table of Contents

Figures (4)