Table of Contents
Fetching ...

Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks

Takayuki Osa, Tatsuya Harada

TL;DR

The paper addresses the fragility of caregiver policies in cooperative multi-agent RL when care-receiver behavior varies. It introduces a framework that (a) learns diverse care-receiver responses by maximizing mutual information between a latent variable and state-action pairs (latent-conditioned policies), and (b) uses adversarial style sampling to optimize the caregiver against worst-case care-receiver styles during training. The approach is implemented with PPO-based methods (LPPO) and evaluated on Assistive Gym tasks, showing improved robustness over standard co-optimization baselines and meaningful resilience to policy changes in care-receivers. This work advances deployable assistive robotics by enhancing policy transferability and safety across diverse real-world care-receiver behaviors.

Abstract

Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.

Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks

TL;DR

The paper addresses the fragility of caregiver policies in cooperative multi-agent RL when care-receiver behavior varies. It introduces a framework that (a) learns diverse care-receiver responses by maximizing mutual information between a latent variable and state-action pairs (latent-conditioned policies), and (b) uses adversarial style sampling to optimize the caregiver against worst-case care-receiver styles during training. The approach is implemented with PPO-based methods (LPPO) and evaluated on Assistive Gym tasks, showing improved robustness over standard co-optimization baselines and meaningful resilience to policy changes in care-receivers. This work advances deployable assistive robotics by enhancing policy transferability and safety across diverse real-world care-receiver behaviors.

Abstract

Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.
Paper Structure (14 sections, 12 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 12 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Diverse care receiver's responses for the feeding task in Assistive Gym Erickson20. Our framework autonomously learns diverse care receiver's responses and robustifies the caregiver's policy in an adversarial-training fashion.
  • Figure 2: Tasks in Assistive Gym used in the evaluation.
  • Figure 3: Learning curves of the proposed and baseline methods.
  • Figure 4: Diverse behaviors of the care-receiver obtained for the FeedingPR2Human-v1 task. The orientation of the care-receiver's head changes according to the value of the latent variable. (a)-(d) correspond to ${\boldsymbol{z}}_r=[0.9,0.9], [-0.9,0.9], [0.9,-0.9], [-0.9,-0.9]$, respectively. Human size and color are randomly set.
  • Figure 5: Procedure for evaluating the robustness of the caregiver's policy. A caregiver's policy was evaluated using a care-receiver's policy, which was trained separately.