A Model-Agnostic Approach for Semantically Driven Disambiguation in Human-Robot Interaction

Fethiye Irmak Dogan; Maithili Patel; Weiyu Liu; Iolanda Leite; Sonia Chernova

A Model-Agnostic Approach for Semantically Driven Disambiguation in Human-Robot Interaction

Fethiye Irmak Dogan, Maithili Patel, Weiyu Liu, Iolanda Leite, Sonia Chernova

TL;DR

This work tackles ambiguity in human-robot instruction in large, shared spaces by introducing a model-agnostic, semantically driven clarification framework. It combines knowledge embeddings from custom semantic encoders or large language models with information-theoretic informative clarifications and an iterative inference process to first determine the object’s room and then its specific location. Across pre-studies and a user experiment with 713 expressions, the approach consistently improves first-attempt (HIT@1) predictions and proves robust across diverse embedding backbones, including LLMs. The findings suggest that semantically grounded clarifications can substantially reduce search space and enhance real-time object retrieval in household settings, with potential extensions to other domains and modalities.

Abstract

Ambiguities are inevitable in human-robot interaction, especially when a robot follows user instructions in a large, shared space. For example, if a user asks the robot to find an object in a home environment with underspecified instructions, the object could be in multiple locations depending on missing factors. For instance, a bowl might be in the kitchen cabinet or on the dining room table, depending on whether it is clean or dirty, full or empty, and the presence of other objects around it. Previous works on object search have assumed that the queried object is immediately visible to the robot or have predicted object locations using one-shot inferences, which are likely to fail for ambiguous or partially understood instructions. This paper focuses on these gaps and presents a novel model-agnostic approach leveraging semantically driven clarifications to enhance the robot's ability to locate queried objects in fewer attempts. Specifically, we leverage different knowledge embedding models, and when ambiguities arise, we propose an informative clarification method, which follows an iterative prediction process. The user experiment evaluation of our method shows that our approach is applicable to different custom semantic encoders as well as LLMs, and informative clarifications improve performances, enabling the robot to locate objects on its first attempts. The user experiment data is publicly available at https://github.com/IrmakDogan/ExpressionDataset.

A Model-Agnostic Approach for Semantically Driven Disambiguation in Human-Robot Interaction

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 3 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Method
Knowledge Embedding Models
Custom Semantic Encoders
Large Language Models
Informative Clarification
Inference Phase and Iterative Prediction
Pre-Study
Analyzing the Knowledge Embeddings
Training Custom Semantic Encoders
Confidence Values and Thresholds
User Experiment
Experiment Data
Experiment Procedure
...and 5 more sections

Figures (3)

Figure 1: An overview of our method. On the left-hand side, when the user asks the robot to find the wine glass, the robot identifies the input query as ambiguous, then asks for informative semantic properties of the object (in green) and makes the output predictions (in blue) with the gathered new features. To achieve this, on the right-hand side, the initial query is forwarded to the knowledge embedding model to first find the room of the object. Once the initial query is identified as ambiguous, informative clarifications are asked to obtain further knowledge, which is provided to the knowledge embedding model as additional features. After the object's most probable room is inferred, this feature is also given to the embedding, which is then queried to predict the most likely location of the object (iterative predictions).
Figure 2: The flowchart summarising our overall approach using custom semantic encoders.
Figure 3: The scenes from a home video clip simulating the robot's object search for given user object descriptions in the pre-study. Given the complexity of the environment, people's descriptions included ambiguities and missing information.

A Model-Agnostic Approach for Semantically Driven Disambiguation in Human-Robot Interaction

TL;DR

Abstract

A Model-Agnostic Approach for Semantically Driven Disambiguation in Human-Robot Interaction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)