Table of Contents
Fetching ...

How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

Hang Yu, Qidi Fang, Shijie Fang, Reuben M. Aronson, Elaine Schaertl Short

TL;DR

This work introduces progress as a new human teaching signal that encodes task completion on a $[0,100]$ scale to complement demonstrations and scalar feedback in robot learning. Through two online studies with 76 participants and a public-space in-person study with 40 participants, Progress is shown to indicate task completion, reflect completion degree, resist sub-optimal demonstrations, and incur no extra workload or time relative to traditional feedback. The authors demonstrate progress across simple and long-horizon tasks, provide a dataset of 40 non-expert demonstrations, and discuss implications for reward shaping and inverse reinforcement learning, including reward-hacking detection. Overall, progress offers a robust, informative signal for interactive learning and demonstrates practical benefits for non-expert, real-world demonstrations. The work suggests integrating progress into learning pipelines could improve robustness and data efficiency in human-in-the-loop robot training.

Abstract

Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts. The dataset is available at https://github.com/TeachingwithProgress/Non-Expert\_Demonstrations.

How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

TL;DR

This work introduces progress as a new human teaching signal that encodes task completion on a scale to complement demonstrations and scalar feedback in robot learning. Through two online studies with 76 participants and a public-space in-person study with 40 participants, Progress is shown to indicate task completion, reflect completion degree, resist sub-optimal demonstrations, and incur no extra workload or time relative to traditional feedback. The authors demonstrate progress across simple and long-horizon tasks, provide a dataset of 40 non-expert demonstrations, and discuss implications for reward shaping and inverse reinforcement learning, including reward-hacking detection. Overall, progress offers a robust, informative signal for interactive learning and demonstrates practical benefits for non-expert, real-world demonstrations. The work suggests integrating progress into learning pipelines could improve robustness and data efficiency in human-in-the-loop robot training.

Abstract

Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts. The dataset is available at https://github.com/TeachingwithProgress/Non-Expert\_Demonstrations.
Paper Structure (22 sections, 1 equation, 8 figures, 1 table)

This paper contains 22 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: Public space study with an ice cream topping-adding task to collect demonstrations and progress from non-experts.
  • Figure 2: Online Study Setups. Three simple tasks for comparing the workload of giving progress, and a long-horizon task for comparing the applicability of progress.
  • Figure 3: Online Study One Results. The workload of giving progress has no difference from giving scalar feedback.
  • Figure 4: Online Study II Results. Progress and scalar feedback carry different information. Scalar feedback reflects the optimality of a trajectory, while progress reflects the degree towards completing the task. Progress is more consistent than scalar feedback when the demonstration is non-optimal across participants. Progress is also capable of indicating if the task is completed successfully even if the robot has made a minor mistake or made a faulty mistake but fixed it.
  • Figure 5: 3D visualization of 40 non-expert demonstration trajectories. The positions of objects are marked out with images. The blue trajectories are successful demonstrations and the orange trajectories are faulty demonstrations. Most of the demonstrations succeed, while the policies are diverse.
  • ...and 3 more figures