Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning
Mohammed Abugurain, Shinkyu Park
TL;DR
The paper addresses the challenge of translating natural-language robot navigation commands into reliable motion plans in the presence of ambiguity and user preferences. It proposes an end-to-end pipeline that first detects ambiguity using $text-embedding-ada-002$ features fed to a random forest, then uses GPT-4 to generate clarifying questions and produce disambiguated instructions, while a memory component stores user preferences for future interactions. Disambiguated instructions are translated into $LTL$ specifications and solved via a planning approach based on a graph of convex sets, enabling precise, temporally-aware navigation. The framework demonstrates improved ambiguity handling (reducing ambiguity likelihood by $36.8 ext{%}$) and high detection accuracy ($0.85$) compared to baselines, highlighting potential for safer, more personalized human-robot collaboration in complex environments.
Abstract
This paper presents a framework that can interpret humans' navigation commands containing temporal elements and directly translate their natural language instructions into robot motion planning. Central to our framework is utilizing Large Language Models (LLMs). To enhance the reliability of LLMs in the framework and improve user experience, we propose methods to resolve the ambiguity in natural language instructions and capture user preferences. The process begins with an ambiguity classifier, identifying potential uncertainties in the instructions. Ambiguous statements trigger a GPT-4-based mechanism that generates clarifying questions, incorporating user responses for disambiguation. Also, the framework assesses and records user preferences for non-ambiguous instructions, enhancing future interactions. The last part of this process is the translation of disambiguated instructions into a robot motion plan using Linear Temporal Logic. This paper details the development of this framework and the evaluation of its performance in various test scenarios.
