Learning Human-Aware Robot Policies for Adaptive Assistance
Jason Qin, Shikun Ban, Wentao Zhu, Yizhou Wang, Dimitris Samaras
TL;DR
This work tackles reward misalignment in assistive robotics by proposing a human-aware policy learning framework with two key modules: an anticipation module that forecasts future human motion over $k$ steps and a utility module that online-infers human preference weights through interaction without explicit queries. By formulating the problem as a two-agent Dec-POMDP where the human reward includes both task and preference components while the robot optimizes only task reward, the approach enables adaptive, safer, and more personalized assistance. Empirical results across multiple tasks and robot embodiments show improved task success, efficiency, and user satisfaction, with strong generalization and ablation-supported evidence of each module's value. The work advances practical human-robot collaboration by enabling online inference of human utilities and motion tendencies, and the authors provide code and demos for reproducibility and broader impact.
Abstract
Developing robots that can assist humans efficiently, safely, and adaptively is crucial for real-world applications such as healthcare. While previous work often assumes a centralized system for co-optimizing human-robot interactions, we argue that real-world scenarios are much more complicated, as humans have individual preferences regarding how tasks are performed. Robots typically lack direct access to these implicit preferences. However, to provide effective assistance, robots must still be able to recognize and adapt to the individual needs and preferences of different users. To address these challenges, we propose a novel framework in which robots infer human intentions and reason about human utilities through interaction. Our approach features two critical modules: the anticipation module is a motion predictor that captures the spatial-temporal relationship between the robot agent and user agent, which contributes to predicting human behavior; the utility module infers the underlying human utility functions through progressive task demonstration sampling. Extensive experiments across various robot types and assistive tasks demonstrate that the proposed framework not only enhances task success and efficiency but also significantly improves user satisfaction, paving the way for more personalized and adaptive assistive robotic systems. Code and demos are available at https://asonin.github.io/Human-Aware-Assistance/.
