Table of Contents
Fetching ...

Enhancing Goal Inference via Correction Timing

Anjiabei Wang, Shuangge Wang, Tesca Fitzgerald

TL;DR

Three potential applications for the value of correction timing as a signal for robot learning are investigated: identifying features of a robot's motion that may prompt people to correct it, quickly inferring the final goal of a human's correction based on the timing and initial direction of their correction motion, and learning more precise constraints for task objectives.

Abstract

Corrections offer a natural modality for people to provide feedback to a robot, by (i) intervening in the robot's behavior when they believe the robot is failing (or will fail) the task objectives and (ii) modifying the robot's behavior to successfully fulfill the task. Each correction offers information on what the robot should and should not do, where the corrected behavior is more aligned with task objectives than the original behavior. Most prior work on learning from corrections involves interpreting a correction as a new demonstration (consisting of the modified robot behavior), or a preference (for the modified trajectory compared to the robot's original behavior). However, this overlooks one essential element of the correction feedback, which is the human's decision to intervene in the robot's behavior in the first place. This decision can be influenced by multiple factors including the robot's task progress, alignment with human expectations, dynamics, motion legibility, and optimality. In this work, we investigate whether the timing of this decision can offer a useful signal for inferring these task-relevant influences. In particular, we investigate three potential applications for this learning signal: (1) identifying features of a robot's motion that may prompt people to correct it, (2) quickly inferring the final goal of a human's correction based on the timing and initial direction of their correction motion, and (3) learning more precise constraints for task objectives. Our results indicate that correction timing results in improved learning for the first two of these applications. Overall, our work provides new insights on the value of correction timing as a signal for robot learning.

Enhancing Goal Inference via Correction Timing

TL;DR

Three potential applications for the value of correction timing as a signal for robot learning are investigated: identifying features of a robot's motion that may prompt people to correct it, quickly inferring the final goal of a human's correction based on the timing and initial direction of their correction motion, and learning more precise constraints for task objectives.

Abstract

Corrections offer a natural modality for people to provide feedback to a robot, by (i) intervening in the robot's behavior when they believe the robot is failing (or will fail) the task objectives and (ii) modifying the robot's behavior to successfully fulfill the task. Each correction offers information on what the robot should and should not do, where the corrected behavior is more aligned with task objectives than the original behavior. Most prior work on learning from corrections involves interpreting a correction as a new demonstration (consisting of the modified robot behavior), or a preference (for the modified trajectory compared to the robot's original behavior). However, this overlooks one essential element of the correction feedback, which is the human's decision to intervene in the robot's behavior in the first place. This decision can be influenced by multiple factors including the robot's task progress, alignment with human expectations, dynamics, motion legibility, and optimality. In this work, we investigate whether the timing of this decision can offer a useful signal for inferring these task-relevant influences. In particular, we investigate three potential applications for this learning signal: (1) identifying features of a robot's motion that may prompt people to correct it, (2) quickly inferring the final goal of a human's correction based on the timing and initial direction of their correction motion, and (3) learning more precise constraints for task objectives. Our results indicate that correction timing results in improved learning for the first two of these applications. Overall, our work provides new insights on the value of correction timing as a signal for robot learning.
Paper Structure (49 sections, 23 equations, 9 figures, 2 tables)

This paper contains 49 sections, 23 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: A human intervenes during a robot’s pre-planned trajectory $\xi$, resulting in a corrected trajectory $c$. The timing information is represented by $t_c$, while the spatial information includes $c_p$, $c_p'$, and $c_l$.
  • Figure 2: Cumulative first-correction timing across trajectory completion. “Correction Ratio” denotes the proportion of first-correction events occurring up to each completion percentage.
  • Figure 3: F1 scores for the multi-feature model and Boltzmann baseline across correction timing percentages (portion of trajectory completed before the first correction). Statistical significance in all figures is indicated as $p \leq 0.05^*$, $p \leq 0.01^{**}$, $p \leq 0.001^{***}$.
  • Figure 4: Predicted-to-actual correction ratio for the multi-feature model and Boltzmann baseline across correction timing percentages.
  • Figure 5: Mean absolute error (seconds) between predicted and true correction timing for the multi-feature model and Boltzmann baseline across correction timing percentages. Lower MAE indicates better accuracy.
  • ...and 4 more figures