Table of Contents
Fetching ...

Learning Human-Aware Robot Policies for Adaptive Assistance

Jason Qin, Shikun Ban, Wentao Zhu, Yizhou Wang, Dimitris Samaras

TL;DR

This work tackles reward misalignment in assistive robotics by proposing a human-aware policy learning framework with two key modules: an anticipation module that forecasts future human motion over $k$ steps and a utility module that online-infers human preference weights through interaction without explicit queries. By formulating the problem as a two-agent Dec-POMDP where the human reward includes both task and preference components while the robot optimizes only task reward, the approach enables adaptive, safer, and more personalized assistance. Empirical results across multiple tasks and robot embodiments show improved task success, efficiency, and user satisfaction, with strong generalization and ablation-supported evidence of each module's value. The work advances practical human-robot collaboration by enabling online inference of human utilities and motion tendencies, and the authors provide code and demos for reproducibility and broader impact.

Abstract

Developing robots that can assist humans efficiently, safely, and adaptively is crucial for real-world applications such as healthcare. While previous work often assumes a centralized system for co-optimizing human-robot interactions, we argue that real-world scenarios are much more complicated, as humans have individual preferences regarding how tasks are performed. Robots typically lack direct access to these implicit preferences. However, to provide effective assistance, robots must still be able to recognize and adapt to the individual needs and preferences of different users. To address these challenges, we propose a novel framework in which robots infer human intentions and reason about human utilities through interaction. Our approach features two critical modules: the anticipation module is a motion predictor that captures the spatial-temporal relationship between the robot agent and user agent, which contributes to predicting human behavior; the utility module infers the underlying human utility functions through progressive task demonstration sampling. Extensive experiments across various robot types and assistive tasks demonstrate that the proposed framework not only enhances task success and efficiency but also significantly improves user satisfaction, paving the way for more personalized and adaptive assistive robotic systems. Code and demos are available at https://asonin.github.io/Human-Aware-Assistance/.

Learning Human-Aware Robot Policies for Adaptive Assistance

TL;DR

This work tackles reward misalignment in assistive robotics by proposing a human-aware policy learning framework with two key modules: an anticipation module that forecasts future human motion over steps and a utility module that online-infers human preference weights through interaction without explicit queries. By formulating the problem as a two-agent Dec-POMDP where the human reward includes both task and preference components while the robot optimizes only task reward, the approach enables adaptive, safer, and more personalized assistance. Empirical results across multiple tasks and robot embodiments show improved task success, efficiency, and user satisfaction, with strong generalization and ablation-supported evidence of each module's value. The work advances practical human-robot collaboration by enabling online inference of human utilities and motion tendencies, and the authors provide code and demos for reproducibility and broader impact.

Abstract

Developing robots that can assist humans efficiently, safely, and adaptively is crucial for real-world applications such as healthcare. While previous work often assumes a centralized system for co-optimizing human-robot interactions, we argue that real-world scenarios are much more complicated, as humans have individual preferences regarding how tasks are performed. Robots typically lack direct access to these implicit preferences. However, to provide effective assistance, robots must still be able to recognize and adapt to the individual needs and preferences of different users. To address these challenges, we propose a novel framework in which robots infer human intentions and reason about human utilities through interaction. Our approach features two critical modules: the anticipation module is a motion predictor that captures the spatial-temporal relationship between the robot agent and user agent, which contributes to predicting human behavior; the utility module infers the underlying human utility functions through progressive task demonstration sampling. Extensive experiments across various robot types and assistive tasks demonstrate that the proposed framework not only enhances task success and efficiency but also significantly improves user satisfaction, paving the way for more personalized and adaptive assistive robotic systems. Code and demos are available at https://asonin.github.io/Human-Aware-Assistance/.

Paper Structure

This paper contains 27 sections, 11 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: In our task scenario (demonstrated with the feeding example in the figure), the robot's initial objective is to achieve the basic goal of "feeding the food," represented by the Task Reward. However, as shown in Left, the human user also has more nuanced Preference Reward, which are unknown to the robot, leading to a misalignment between the human and robot reward functions. To address this, we propose a novel framework, as depicted in Right. Beyond learning to fulfill the basic task requirements, we introduce two additional modules to better model human behavior and preferences: a Motion Anticipation module for predicting the human's future motion and a Utility Inference module for estimating user preferences.
  • Figure 2: Overview of the proposed framework. Each agent in the system is receiving an observation about its own information and some critical information about the other agent. The human agent is controlled by an independent policy powered by RL algorithms. The robot agent consists of three parts: an RL backbone, an anticipation module \ref{['subsec:anticipation']}, and a utility module \ref{['subsec:preference']}. The anticipation module predicts future human motion by taking past $k$ frames of joint information $p_R^{t-k,t}$ and $p_{H}^{t-k,t}$ from both agents, and predicting an anticipated k steps future human joint information $\hat{p}_{H}^{t,t+k}$. The utility module leverages the interaction histories to estimate a robot preference reward weight $\hat{w}_H$, which is used to compute an estimated preference reward $\hat{r}_{\text{pref}}$ to further guide robot policy learning.
  • Figure 3: Successful episodes of our method for each task scenario. The key frames are arranged sequentially from left to right, following the progression of the episode.
  • Figure 4: Training curves of baseline PPO, TD3 and our method in 4 different human preference settings (with robot Sawyer conducting the feeding task). Our method delivers generally superior performance throughout the training process.
  • Figure 5: An example procedure of feeding task. Left: Early stage when Robot arm is far from human body. Mid: Middle stage when Robot arm is closer to human body. Right: Late stage when Robot arm touches human body, completing feeding.
  • ...and 2 more figures