Table of Contents
Fetching ...

Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning

Mohammed Abugurain, Shinkyu Park

TL;DR

The paper addresses the challenge of translating natural-language robot navigation commands into reliable motion plans in the presence of ambiguity and user preferences. It proposes an end-to-end pipeline that first detects ambiguity using $text-embedding-ada-002$ features fed to a random forest, then uses GPT-4 to generate clarifying questions and produce disambiguated instructions, while a memory component stores user preferences for future interactions. Disambiguated instructions are translated into $LTL$ specifications and solved via a planning approach based on a graph of convex sets, enabling precise, temporally-aware navigation. The framework demonstrates improved ambiguity handling (reducing ambiguity likelihood by $36.8 ext{%}$) and high detection accuracy ($0.85$) compared to baselines, highlighting potential for safer, more personalized human-robot collaboration in complex environments.

Abstract

This paper presents a framework that can interpret humans' navigation commands containing temporal elements and directly translate their natural language instructions into robot motion planning. Central to our framework is utilizing Large Language Models (LLMs). To enhance the reliability of LLMs in the framework and improve user experience, we propose methods to resolve the ambiguity in natural language instructions and capture user preferences. The process begins with an ambiguity classifier, identifying potential uncertainties in the instructions. Ambiguous statements trigger a GPT-4-based mechanism that generates clarifying questions, incorporating user responses for disambiguation. Also, the framework assesses and records user preferences for non-ambiguous instructions, enhancing future interactions. The last part of this process is the translation of disambiguated instructions into a robot motion plan using Linear Temporal Logic. This paper details the development of this framework and the evaluation of its performance in various test scenarios.

Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning

TL;DR

The paper addresses the challenge of translating natural-language robot navigation commands into reliable motion plans in the presence of ambiguity and user preferences. It proposes an end-to-end pipeline that first detects ambiguity using features fed to a random forest, then uses GPT-4 to generate clarifying questions and produce disambiguated instructions, while a memory component stores user preferences for future interactions. Disambiguated instructions are translated into specifications and solved via a planning approach based on a graph of convex sets, enabling precise, temporally-aware navigation. The framework demonstrates improved ambiguity handling (reducing ambiguity likelihood by ) and high detection accuracy () compared to baselines, highlighting potential for safer, more personalized human-robot collaboration in complex environments.

Abstract

This paper presents a framework that can interpret humans' navigation commands containing temporal elements and directly translate their natural language instructions into robot motion planning. Central to our framework is utilizing Large Language Models (LLMs). To enhance the reliability of LLMs in the framework and improve user experience, we propose methods to resolve the ambiguity in natural language instructions and capture user preferences. The process begins with an ambiguity classifier, identifying potential uncertainties in the instructions. Ambiguous statements trigger a GPT-4-based mechanism that generates clarifying questions, incorporating user responses for disambiguation. Also, the framework assesses and records user preferences for non-ambiguous instructions, enhancing future interactions. The last part of this process is the translation of disambiguated instructions into a robot motion plan using Linear Temporal Logic. This paper details the development of this framework and the evaluation of its performance in various test scenarios.
Paper Structure (9 sections, 4 figures, 2 tables)

This paper contains 9 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Our proposed framework consisting of with a random forest classifier for the ambiguity detection in human instructions, and GPT-4 for the disambiguation process and user preference identification. The framework integrates existing methods for translating the processed instructions into LTL specifications and for planning navigation paths based on these specifications.
  • Figure 2: Ambiguity likelihood before and after the disambiguation process
  • Figure 3: Dialog between a user and a robot in the disambiguation and user preference identification process.
  • Figure 4: Validation of the proposed framework for robot navigation: Each figure illustrates a distinct stage in the robot's navigation as it follows the user's instructions (refer to Fig. \ref{['fig:conversation']} for the instructions.)