Research on Navigation Methods Based on LLMs
Anlong Zhang, Jianmin Ji
TL;DR
This paper tackles indoor navigation by integrating Large Language Models (LLMs) as central controllers through a function-calling framework. By decomposing navigation capabilities into modular tools (location, mapping, planning, motion control) and enforcing a structured system-prompt protocol with a CoT-based safeguard, the approach aims to improve semantic understanding and adaptability across dynamic environments. Experimental validation in PyBullet across living room, kitchen, and bedroom scenarios shows that the LLM-driven navigator can match traditional path-planning performance while achieving substantially higher instruction-understanding success, with Phi-4-14B delivering strong results in local deployment. The findings underscore the potential of LLMs to serve as reusable, adaptable components for embodied navigation, enabling scalable extension to new tasks and hardware with minimal retraining.
Abstract
In recent years, the field of indoor navigation has witnessed groundbreaking advancements through the integration of Large Language Models (LLMs). Traditional navigation approaches relying on pre-built maps or reinforcement learning exhibit limitations such as poor generalization and limited adaptability to dynamic environments. In contrast, LLMs offer a novel paradigm for complex indoor navigation tasks by leveraging their exceptional semantic comprehension, reasoning capabilities, and zero-shot generalization properties. We propose an LLM-based navigation framework that leverages function calling capabilities, positioning the LLM as the central controller. Our methodology involves modular decomposition of conventional navigation functions into reusable LLM tools with expandable configurations. This is complemented by a systematically designed, transferable system prompt template and interaction workflow that can be easily adapted across different implementations. Experimental validation in PyBullet simulation environments across diverse scenarios demonstrates the substantial potential and effectiveness of our approach, particularly in achieving context-aware navigation through dynamic tool composition.
