How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots
Hang Yu, Qidi Fang, Shijie Fang, Reuben M. Aronson, Elaine Schaertl Short
TL;DR
This work introduces progress as a new human teaching signal that encodes task completion on a $[0,100]$ scale to complement demonstrations and scalar feedback in robot learning. Through two online studies with 76 participants and a public-space in-person study with 40 participants, Progress is shown to indicate task completion, reflect completion degree, resist sub-optimal demonstrations, and incur no extra workload or time relative to traditional feedback. The authors demonstrate progress across simple and long-horizon tasks, provide a dataset of 40 non-expert demonstrations, and discuss implications for reward shaping and inverse reinforcement learning, including reward-hacking detection. Overall, progress offers a robust, informative signal for interactive learning and demonstrates practical benefits for non-expert, real-world demonstrations. The work suggests integrating progress into learning pipelines could improve robustness and data efficiency in human-in-the-loop robot training.
Abstract
Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts. The dataset is available at https://github.com/TeachingwithProgress/Non-Expert\_Demonstrations.
