REALM: Real-Time Estimates of Assistance for Learned Models in Human-Robot Interaction
Michael Hagenow, Julie A. Shah
TL;DR
REALM presents an online framework to select human assistance modalities in real-time by comparing post-intervention action-space entropy across mechanisms (no input, discrete choices, corrections, teleoperation) using rollouts from a stochastic policy, notably diffusion-based. The method defines $h(\mathbf{A}_t|m)$ for each mechanism and computes a penalized value $V(m|\mathbf{A}_{\tau})$ to balance information gain with human effort, enabling the robot to request input only when it meaningfully reduces uncertainty. Validation includes a simulated 2D Uncerpentine task and a preliminary human-robot study in tabletop manipulation, showing accurate mechanism identification, reduced user input, and favorable user preferences without compromising task performance. The framework integrates with emergent learning models and offers a practical pathway to more efficient, input-aware human-robot collaboration, with open-source tooling for replication.
Abstract
There are a variety of mechanisms (i.e., input types) for real-time human interaction that can facilitate effective human-robot teaming. For example, previous works have shown how teleoperation, corrective, and discrete (i.e., preference over a small number of choices) input can enable robots to complete complex tasks. However, few previous works have looked at combining different methods, and in particular, opportunities for a robot to estimate and elicit the most effective form of assistance given its understanding of a task. In this paper, we propose a method for estimating the value of different human assistance mechanisms based on the action uncertainty of a robot policy. Our key idea is to construct mathematical expressions for the expected post-interaction differential entropy (i.e., uncertainty) of a stochastic robot policy to compare the expected value of different interactions. As each type of human input imposes a different requirement for human involvement, we demonstrate how differential entropy estimates can be combined with a likelihood penalization approach to effectively balance feedback informational needs with the level of required input. We demonstrate evidence of how our approach interfaces with emergent learning models (e.g., a diffusion model) to produce accurate assistance value estimates through both simulation and a robot user study. Our user study results indicate that the proposed approach can enable task completion with minimal human feedback for uncertain robot behaviors.
