ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration

Angelo Caregnato-Neto; Luciano Cavalcante Siebert; Arkady Zgonnikov; Marcos Ricardo Omena de Albuquerque Maximo; Rubens Junqueira Magalhães Afonso

ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration

Angelo Caregnato-Neto, Luciano Cavalcante Siebert, Arkady Zgonnikov, Marcos Ricardo Omena de Albuquerque Maximo, Rubens Junqueira Magalhães Afonso

TL;DR

The paper tackles the challenge of integrating predictive models of human behavior with multi-robot motion planning for collaborative exploration. It introduces ARMCHAIR, which combines adversarial inverse reinforcement learning to learn a human prediction model with a receding-horizon MPC-MIP planner that jointly optimizes trajectories and task allocation while enforcing network connectivity. The approach includes a robustification via safety regions to handle prediction uncertainty and is validated through extensive Monte Carlo simulations in two environments, showing improved safety (fewer collisions and disconnections) and task performance compared to baselines. While promising, the authors note limitations such as lack of formal recursive feasibility proofs and centralized scalability, and propose future work on distributed planning and real-human experiments to further validate the framework.

Abstract

One of the key issues in human-robot collaboration is the development of computational models that allow robots to predict and adapt to human behavior. Much progress has been achieved in developing such models, as well as control techniques that address the autonomy problems of motion planning and decision-making in robotics. However, the integration of computational models of human behavior with such control techniques still poses a major challenge, resulting in a bottleneck for efficient collaborative human-robot teams. In this context, we present a novel architecture for human-robot collaboration: Adaptive Robot Motion for Collaboration with Humans using Adversarial Inverse Reinforcement learning (ARMCHAIR). Our solution leverages adversarial inverse reinforcement learning and model predictive control to compute optimal trajectories and decisions for a mobile multi-robot system that collaborates with a human in an exploration task. During the mission, ARMCHAIR operates without human intervention, autonomously identifying the necessity to support and acting accordingly. Our approach also explicitly addresses the network connectivity requirement of the human-robot team. Extensive simulation-based evaluations demonstrate that ARMCHAIR allows a group of robots to safely support a simulated human in an exploration scenario, preventing collisions and network disconnections, and improving the overall performance of the task.

ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration

TL;DR

Abstract

Paper Structure (17 sections, 12 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 12 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Background
Related work
Contributions
Notation and definitions
Problem Description
Adaptive Robot Motion for Collaboration with Humans using Adversarial Inverse Reinforcement learning (ARMCHAIR)
Adversarial Inverse Reinforcement Learning
Human Prediction Model
MPC-MIP Formulation
Robust formulation
Results
Environment 1: Sparse target distribution
Environment 2: Grouped target distribution
Discussion and Conclusion
...and 2 more sections

Figures (8)

Figure 1: ARMCHAIR control architecture. The AIRL offline layer provides a human prediction model that is used by the MPC-MIP algorithm to compute proper trajectories and decisions for the MRS in a closed loop.
Figure 2: Illustration of environment with a target of type B in $\mathcal{T}_1$ and two targets of type A represented by the polytopes $\mathcal{T}_2$ and $\mathcal{T}_3$; three obstacles $\mathcal{O}_1$, $\mathcal{O}_2$, and $\mathcal{O}_3$; and a terminal region $\mathcal{F}$. The polytopes $\mathcal{R}_1$ and $\mathcal{R}_2$ are outer approximations of the robots' bodies. Connectivity regions are depicted as circles.
Figure 3: Human position, target of type A, and terminal region T in a grid encoded as the feature maps $\mathbf{F}_{\text{pos}}$, $\mathbf{F}_{\text{A}}$, $\mathbf{F}_{\text{ter}}$, respectively.
Figure 4: Safety region around the human to be avoided by the MRS.
Figure 5: Example simulation of the open-loop MIP baseline in Environment 1 (Sparse target distribution): initial a) and final b) time steps for simulation 2 out of 1000. Solid and dashed lines represent the actual motion and predictions of each agent, respectively. The human prediction is computed using only the initial conditions (open-loop) and $\hat{\pi}$. The prediction suggests that only targets of type B will be visited by the human and the robots are dispatched to the remaining ones. In b) we observe redundant visits to targets 3 and 4 (Type A) since the human deviates from the prediction.
...and 3 more figures

ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration

TL;DR

Abstract

ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration

Authors

TL;DR

Abstract

Table of Contents

Figures (8)