Table of Contents
Fetching ...

Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati

TL;DR

This work tackles open-ended goal inference in human–robot collaboration by fusing natural language preferences with observed actions. It introduces BALI, a Bidirectional Action-Language Inference framework that combines a receding-horizon planner with a modular question-asking mechanism, enabling robots to infer unbounded goals and request clarifications only when beneficial. The approach demonstrates stability and accuracy gains over action-only, language-only, and dialog-based baselines in both simulated and real-world collaborative cooking tasks, including strong performance in closed settings with goal and policy banks. By grounding open-ended goals in both action evidence and language signals, BALI achieves faster convergence and fewer mistakes, highlighting a practical path toward robust open-world HRC with minimal human interruption.

Abstract

To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

TL;DR

This work tackles open-ended goal inference in human–robot collaboration by fusing natural language preferences with observed actions. It introduces BALI, a Bidirectional Action-Language Inference framework that combines a receding-horizon planner with a modular question-asking mechanism, enabling robots to infer unbounded goals and request clarifications only when beneficial. The approach demonstrates stability and accuracy gains over action-only, language-only, and dialog-based baselines in both simulated and real-world collaborative cooking tasks, including strong performance in closed settings with goal and policy banks. By grounding open-ended goals in both action evidence and language signals, BALI achieves faster convergence and fewer mistakes, highlighting a practical path toward robust open-world HRC with minimal human interruption.

Abstract

To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

Paper Structure

This paper contains 29 sections, 3 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: BALI for goal prediction: Human preferences and actions are summarized (orange) and used for Goal Inference (green) to update an estimate of plausible goals. The Ask Question module (purple) triggers clarifications when goal uncertainty exceeds a threshold. The planner (pink) expands action sequences under receding-horizon planning: at each non-leaf node, the Valid Action Filter (blue) prunes infeasible actions (shown greyed out). The cost function module (yellow) computes attractor field cost linking actions to goals and guiding the search toward trajectories aligned with plausible human goals and preferences.
  • Figure 2: Results for the open case (a–c) and closed case (d–f). (a,d) show inference timing: maroon = time until first correct guess, yellow = period of instability until the last incorrect guess, green = stable correct phase. cross = first correct guess ($\downarrow$=better), circle = last incorrect guess($\downarrow$=better). (b,e) report Top-1 (yellow) and Top-3 (green) goal prediction accuracy ($\uparrow$=better). (c,f) show average mistakes or extra steps ($\downarrow$=better). Means and standard deviations of metrics are computed over 967 preference combinations. Visualization of standard deviations for the first correct guess and the last incorrect guess is skipped for clarity.
  • Figure 3: Real World Case Study