Table of Contents
Fetching ...

Interpreting and learning voice commands with a Large Language Model for a robot system

Stanislau Stankevich, Wojciech Dudek

TL;DR

The paper addresses intuitive voice-based robot interfaces by integrating a Large Language Model with a ROS-based architecture to improve real-time interpretation and decision-making. It introduces Rico, a TIAGo-based service robot, and a modular system (LangProc, CS, TaskER, TD) that dynamically expands task knowledge by learning new intents and slots from dialogue. The approach demonstrates adaptive handling of requests and clarifying interactions via GPT-4 prompts, while acknowledging limitations such as handling high-parameter tasks and occasional hallucinations. The work advances practical, adaptive voice interfaces for care-robots and suggests future work on environment-aware modeling and LLM-guided planning to enhance scalability and robustness.

Abstract

Robots are increasingly common in industry and daily life, such as in nursing homes where they can assist staff. A key challenge is developing intuitive interfaces for easy communication. The use of Large Language Models (LLMs) like GPT-4 has enhanced robot capabilities, allowing for real-time interaction and decision-making. This integration improves robots' adaptability and functionality. This project focuses on merging LLMs with databases to improve decision-making and enable knowledge acquisition for request interpretation problems.

Interpreting and learning voice commands with a Large Language Model for a robot system

TL;DR

The paper addresses intuitive voice-based robot interfaces by integrating a Large Language Model with a ROS-based architecture to improve real-time interpretation and decision-making. It introduces Rico, a TIAGo-based service robot, and a modular system (LangProc, CS, TaskER, TD) that dynamically expands task knowledge by learning new intents and slots from dialogue. The approach demonstrates adaptive handling of requests and clarifying interactions via GPT-4 prompts, while acknowledging limitations such as handling high-parameter tasks and occasional hallucinations. The work advances practical, adaptive voice interfaces for care-robots and suggests future work on environment-aware modeling and LLM-guided planning to enhance scalability and robustness.

Abstract

Robots are increasingly common in industry and daily life, such as in nursing homes where they can assist staff. A key challenge is developing intuitive interfaces for easy communication. The use of Large Language Models (LLMs) like GPT-4 has enhanced robot capabilities, allowing for real-time interaction and decision-making. This integration improves robots' adaptability and functionality. This project focuses on merging LLMs with databases to improve decision-making and enable knowledge acquisition for request interpretation problems.
Paper Structure (3 sections, 4 figures)

This paper contains 3 sections, 4 figures.

Figures (4)

  • Figure 1: Scenario with bringing an item
  • Figure 2: Rico robot
  • Figure 3: System architecture
  • Figure 4: Modules interaction - unexpected question