Table of Contents
Fetching ...

Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning

Prasanth Sengadu Suresh, Siddarth Jain, Prashant Doshi, Diego Romeres

TL;DR

A novel multiagent framework called oDec-MDP is introduced, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution, and generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP.

Abstract

The growing interest in human-robot collaboration (HRC), where humans and robots cooperate towards shared goals, has seen significant advancements over the past decade. While previous research has addressed various challenges, several key issues remain unresolved. Many domains within HRC involve activities that do not necessarily require human presence throughout the entire task. Existing literature typically models HRC as a closed system, where all agents are present for the entire duration of the task. In contrast, an open model offers flexibility by allowing an agent to enter and exit the collaboration as needed, enabling them to concurrently manage other tasks. In this paper, we introduce a novel multiagent framework called oDec-MDP, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution. We generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP. Our method is validated through experiments conducted in both a simplified toy firefighting domain and a realistic dyadic human-robot collaborative assembly. Results show that our framework and learning method improves upon its closed system counterpart.

Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning

TL;DR

A novel multiagent framework called oDec-MDP is introduced, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution, and generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP.

Abstract

The growing interest in human-robot collaboration (HRC), where humans and robots cooperate towards shared goals, has seen significant advancements over the past decade. While previous research has addressed various challenges, several key issues remain unresolved. Many domains within HRC involve activities that do not necessarily require human presence throughout the entire task. Existing literature typically models HRC as a closed system, where all agents are present for the entire duration of the task. In contrast, an open model offers flexibility by allowing an agent to enter and exit the collaboration as needed, enabling them to concurrently manage other tasks. In this paper, we introduce a novel multiagent framework called oDec-MDP, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution. We generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP. Our method is validated through experiments conducted in both a simplified toy firefighting domain and a realistic dyadic human-robot collaborative assembly. Results show that our framework and learning method improves upon its closed system counterpart.
Paper Structure (11 sections, 12 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 12 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: The oDec-MDP graphical model for two timesteps $t$ and $t+1$. Given the collab team ID $c^t$ at timestep $t$, ${\bm s}^{t}_{c^t}$ is formed by combining the local states of all agents in $c^t$. All agents' local actions from $c^t$ combined form ${\bm a}^{t}_{c^t}$, which leads to $c^{t+1}$, given $c^t$. $c^{t+1}$, ${\bm a}^{t}_{c^t}$ and ${\bm s}^{t}_{c^t}$ together lead to the next state ${\bm s}^{t+1}_{c^{t+1}}$ at time $t+1$.
  • Figure 2: A complete episode of the Urban Firefighting domain with learned oDec-AIRL policies. Colored bubbles green and blue represent CallAgent and Extinguish actions respectively. In Figure \ref{['fig:UFF_ag0_travels']}, Agent 0 heads towards the medium fire, and Figure \ref{['fig:UFF_ag0_calls_ag1']}, calls Agent 1 to join. Subsequently, Agent 1 moves to the large fire in Figure \ref{['fig:UFF_ag0_calls_ag2_ag1_travels']}, while Agent 0 calls Agent 2. In Figure \ref{['fig:UFF_ag0_ag2_ag1_extinguish']}, all agents extinguish fires at their locations. After extinguishing the medium fire, Agent 0 moves to the small fire, and Agent 2 assists Agent 1 with the large fire in Figure \ref{['fig:UFF_ag0_ag2_move_ag1_extinguish']}. Figure \ref{['fig:UFF_ag0_extinguish_ag2_ag1_extinguish']} shows agents at their final locations performing the Extinguish action. This visualization was built using PyGame mcgugan2007beginning.
  • Figure 3: A collaborative table assembly task that involves placing and screwing together various wooden components. Left: The assembled table. Right: The individual components required for assembly, including the table base, two supports, two legs, and the necessary screws.
  • Figure 4: Snapshots capturing key moments from a typical trial in the human-robot pilot study. At each step, the robotic agent executes actions based on its learned policy oDec-AIRL, while the human participant is encouraged to emulate the learned human policy. Examples of each action are illustrated upon their initial occurrence. Following the completion of the 'Place' action for Support1, the robot prompts for human assistance through a pop-up notification for the 'Call Agent' action.
  • Figure 5: Left: Findings for subjective measures on a 5-point scale. Right: The average total duration of tasks and the average time allocated to human agents starting from the Call Agent action.