A Model-Agnostic Approach for Semantically Driven Disambiguation in Human-Robot Interaction
Fethiye Irmak Dogan, Maithili Patel, Weiyu Liu, Iolanda Leite, Sonia Chernova
TL;DR
This work tackles ambiguity in human-robot instruction in large, shared spaces by introducing a model-agnostic, semantically driven clarification framework. It combines knowledge embeddings from custom semantic encoders or large language models with information-theoretic informative clarifications and an iterative inference process to first determine the object’s room and then its specific location. Across pre-studies and a user experiment with 713 expressions, the approach consistently improves first-attempt (HIT@1) predictions and proves robust across diverse embedding backbones, including LLMs. The findings suggest that semantically grounded clarifications can substantially reduce search space and enhance real-time object retrieval in household settings, with potential extensions to other domains and modalities.
Abstract
Ambiguities are inevitable in human-robot interaction, especially when a robot follows user instructions in a large, shared space. For example, if a user asks the robot to find an object in a home environment with underspecified instructions, the object could be in multiple locations depending on missing factors. For instance, a bowl might be in the kitchen cabinet or on the dining room table, depending on whether it is clean or dirty, full or empty, and the presence of other objects around it. Previous works on object search have assumed that the queried object is immediately visible to the robot or have predicted object locations using one-shot inferences, which are likely to fail for ambiguous or partially understood instructions. This paper focuses on these gaps and presents a novel model-agnostic approach leveraging semantically driven clarifications to enhance the robot's ability to locate queried objects in fewer attempts. Specifically, we leverage different knowledge embedding models, and when ambiguities arise, we propose an informative clarification method, which follows an iterative prediction process. The user experiment evaluation of our method shows that our approach is applicable to different custom semantic encoders as well as LLMs, and informative clarifications improve performances, enabling the robot to locate objects on its first attempts. The user experiment data is publicly available at https://github.com/IrmakDogan/ExpressionDataset.
