Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati
TL;DR
This work tackles open-ended goal inference in human–robot collaboration by fusing natural language preferences with observed actions. It introduces BALI, a Bidirectional Action-Language Inference framework that combines a receding-horizon planner with a modular question-asking mechanism, enabling robots to infer unbounded goals and request clarifications only when beneficial. The approach demonstrates stability and accuracy gains over action-only, language-only, and dialog-based baselines in both simulated and real-world collaborative cooking tasks, including strong performance in closed settings with goal and policy banks. By grounding open-ended goals in both action evidence and language signals, BALI achieves faster convergence and fewer mistakes, highlighting a practical path toward robust open-world HRC with minimal human interruption.
Abstract
To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.
