Advances in Embodied Navigation Using Large Language Models: A Survey
Jinzhou Lin, Han Gao, Xuxiang Feng, Rongtao Xu, Changwei Wang, Man Zhang, Li Guo, Shibiao Xu
TL;DR
This survey addresses the problem of enabling robust embodied navigation through large language models by dissecting how LLMs support grounded language understanding and few-shot planning within multimodal perception loops. It catalogs state-of-the-art LLM-based navigation architectures, contrasts them with non-LLM VLN baselines, and analyzes common datasets and evaluation metrics, including SPL. Key contributions include a comprehensive review of LLM-based approaches, a dataset-oriented analysis highlighting strengths and gaps, and a discussion of challenges and future directions such as multimodal fusion, memory, and standardized benchmarks. The findings underscore the potential of LLMs to enhance navigation via sophisticated reasoning and semantic understanding, while also emphasizing practical constraints like computation, data quality, and real-time latency. Overall, the work provides a structured guide for researchers to design, compare, and benchmark embodied navigation systems that leverage LLMs for real-world applicability.
Abstract
In recent years, the rapid advancement of Large Language Models (LLMs) such as the Generative Pre-trained Transformer (GPT) has attracted increasing attention due to their potential in a variety of practical applications. The application of LLMs with Embodied Intelligence has emerged as a significant area of focus. Among the myriad applications of LLMs, navigation tasks are particularly noteworthy because they demand a deep understanding of the environment and quick, accurate decision-making. LLMs can augment embodied intelligence systems with sophisticated environmental perception and decision-making support, leveraging their robust language and image-processing capabilities. This article offers an exhaustive summary of the symbiosis between LLMs and embodied intelligence with a focus on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Finally, the article elucidates the role of LLMs in embodied intelligence, based on current research, and forecasts future directions in the field. A comprehensive list of studies in this survey is available at https://github.com/Rongtao-Xu/Awesome-LLM-EN.
