Table of Contents
Fetching ...

Large language model-based task planning for service robots: A review

Shaohan Bian, Ying Zhang, Guohui Tian, Zhiqiang Miao, Edmond Q. Wu, Simon X. Yang, Changchun Hua

TL;DR

This paper surveys the integration of large language models (LLMs) into service-robot task planning, addressing the challenges of planning in unstructured domestic environments. It develops a modality-centric taxonomy spanning text-based, vision-language, audio-based, and multimodal planning, and reviews foundational LLM techniques (pre-training, fine-tuning, retrieval-augmented generation, and prompting) alongside their robotic applications. Key contributions include structuring literature around input modalities, analyzing core hurdles such as perception gaps, real-time constraints, and multimodal fusion, and proposing directions like benchmarks and embodied intelligence to advance practical deployment. The work provides a consolidated reference for researchers and practitioners seeking to enhance autonomy, safety, and adaptability of service robots through LLM-driven task planning.

Abstract

With the rapid advancement of large language models (LLMs) and robotics, service robots are increasingly becoming an integral part of daily life, offering a wide range of services in complex environments. To deliver these services intelligently and efficiently, robust and accurate task planning capabilities are essential. This paper presents a comprehensive overview of the integration of LLMs into service robotics, with a particular focus on their role in enhancing robotic task planning. First, the development and foundational techniques of LLMs, including pre-training, fine-tuning, retrieval-augmented generation (RAG), and prompt engineering, are reviewed. We then explore the application of LLMs as the cognitive core-`brain'-of service robots, discussing how LLMs contribute to improved autonomy and decision-making. Furthermore, recent advancements in LLM-driven task planning across various input modalities are analyzed, including text, visual, audio, and multimodal inputs. Finally, we summarize key challenges and limitations in current research and propose future directions to advance the task planning capabilities of service robots in complex, unstructured domestic environments. This review aims to serve as a valuable reference for researchers and practitioners in the fields of artificial intelligence and robotics.

Large language model-based task planning for service robots: A review

TL;DR

This paper surveys the integration of large language models (LLMs) into service-robot task planning, addressing the challenges of planning in unstructured domestic environments. It develops a modality-centric taxonomy spanning text-based, vision-language, audio-based, and multimodal planning, and reviews foundational LLM techniques (pre-training, fine-tuning, retrieval-augmented generation, and prompting) alongside their robotic applications. Key contributions include structuring literature around input modalities, analyzing core hurdles such as perception gaps, real-time constraints, and multimodal fusion, and proposing directions like benchmarks and embodied intelligence to advance practical deployment. The work provides a consolidated reference for researchers and practitioners seeking to enhance autonomy, safety, and adaptability of service robots through LLM-driven task planning.

Abstract

With the rapid advancement of large language models (LLMs) and robotics, service robots are increasingly becoming an integral part of daily life, offering a wide range of services in complex environments. To deliver these services intelligently and efficiently, robust and accurate task planning capabilities are essential. This paper presents a comprehensive overview of the integration of LLMs into service robotics, with a particular focus on their role in enhancing robotic task planning. First, the development and foundational techniques of LLMs, including pre-training, fine-tuning, retrieval-augmented generation (RAG), and prompt engineering, are reviewed. We then explore the application of LLMs as the cognitive core-`brain'-of service robots, discussing how LLMs contribute to improved autonomy and decision-making. Furthermore, recent advancements in LLM-driven task planning across various input modalities are analyzed, including text, visual, audio, and multimodal inputs. Finally, we summarize key challenges and limitations in current research and propose future directions to advance the task planning capabilities of service robots in complex, unstructured domestic environments. This review aims to serve as a valuable reference for researchers and practitioners in the fields of artificial intelligence and robotics.

Paper Structure

This paper contains 21 sections, 16 figures, 5 tables.

Figures (16)

  • Figure 1: The evolution of LLM since 2019
  • Figure 2: LLM-based task planning for service robots. The robot acquires information in the form of text, video, audio, etc., and passes the processed input signals to the LLM, which carries out the planning of detailed action sequences through natural language processing, task decomposition, and so on.
  • Figure 3: LLM-based task planning taxonomy for service robots. From the point of view of different input modalities, it is categorized into four categories such as text input, visual input, audio input, and multimodal input.
  • Figure 4: Schematic diagram of the architecture of LLM-grop.
  • Figure 5: Schematic diagram of the architecture of COWP in paper.
  • ...and 11 more figures