Table of Contents
Fetching ...

Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games

Ravi Pandya, Changliu Liu, Andrea Bajcsy

TL;DR

This work poses and solves a novel robust reach-avoid dynamic game which enables robots to be maximally influential, but only when a safety backup control exists and finds that SLIDE consistently enables the robot to leverage the influence it has on the human when it is safe to do so, ultimately allowing the robot to be less conservative while still ensuring a high safety rate during task execution.

Abstract

Robots can influence people to accomplish their tasks more efficiently: autonomous cars can inch forward at an intersection to pass through, and tabletop manipulators can go for an object on the table first. However, a robot's ability to influence can also compromise the safety of nearby people if naively executed. In this work, we pose and solve a novel robust reach-avoid dynamic game which enables robots to be maximally influential, but only when a safety backup control exists. On the human side, we model the human's behavior as goal-driven but conditioned on the robot's plan, enabling us to capture influence. On the robot side, we solve the dynamic game in the joint physical and belief space, enabling the robot to reason about how its uncertainty in human behavior will evolve over time. We instantiate our method, called SLIDE (Safely Leveraging Influence in Dynamic Environments), in a high-dimensional (39-D) simulated human-robot collaborative manipulation task solved via offline game-theoretic reinforcement learning. We compare our approach to a robust baseline that treats the human as a worst-case adversary, a safety controller that does not explicitly reason about influence, and an energy-function-based safety shield. We find that SLIDE consistently enables the robot to leverage the influence it has on the human when it is safe to do so, ultimately allowing the robot to be less conservative while still ensuring a high safety rate during task execution.

Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games

TL;DR

This work poses and solves a novel robust reach-avoid dynamic game which enables robots to be maximally influential, but only when a safety backup control exists and finds that SLIDE consistently enables the robot to leverage the influence it has on the human when it is safe to do so, ultimately allowing the robot to be less conservative while still ensuring a high safety rate during task execution.

Abstract

Robots can influence people to accomplish their tasks more efficiently: autonomous cars can inch forward at an intersection to pass through, and tabletop manipulators can go for an object on the table first. However, a robot's ability to influence can also compromise the safety of nearby people if naively executed. In this work, we pose and solve a novel robust reach-avoid dynamic game which enables robots to be maximally influential, but only when a safety backup control exists. On the human side, we model the human's behavior as goal-driven but conditioned on the robot's plan, enabling us to capture influence. On the robot side, we solve the dynamic game in the joint physical and belief space, enabling the robot to reason about how its uncertainty in human behavior will evolve over time. We instantiate our method, called SLIDE (Safely Leveraging Influence in Dynamic Environments), in a high-dimensional (39-D) simulated human-robot collaborative manipulation task solved via offline game-theoretic reinforcement learning. We compare our approach to a robust baseline that treats the human as a worst-case adversary, a safety controller that does not explicitly reason about influence, and an energy-function-based safety shield. We find that SLIDE consistently enables the robot to leverage the influence it has on the human when it is safe to do so, ultimately allowing the robot to be less conservative while still ensuring a high safety rate during task execution.
Paper Structure (10 sections, 7 equations, 6 figures, 4 tables)

This paper contains 10 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Both human and robot arms want to reach their desired objects on the table, but they don't know who is going for which object. Top Row: The human's desired object can be influenced by the robot. Using a influence-unaware safety shield the robot can stay safe, but fails to reach its own object (not live). With our method (SLIDE) the robot influences the human's goal and safely reaches its object. Bottom Row: The human never changes their desired object. Naive influence-aware robot controllers are over-confident and collide. SLIDE recognizes that this can be unsafe and chooses a different goal for the robot, staying safe and live.
  • Figure 2: SLIDE Framework. (left) Before solving the reach-avoid game, we specify the target set (goal locations), failure set (collisions), and a conditional behavior prediction (CBP) model that can predict the human's future trajectory conditioned on the robot's future plan. (center) During simulated gameplay, the SLIDE policy, $\pi^{*}_\mathcal{R}(x_e)$, is trained against a simulated human adversary, $\pi_\mathcal{H}^\dagger(x_e)$ whose control bounds are informed by the CBP model. (right) Online, the robot uses its robust SLIDE policy to safely influence against any human.
  • Figure 3: Closed-Loop Simulations.SLIDE, Marginal-RA and Robust-RA policies starting from the same initial condition. SLIDE confidently understands that the human will be influenced to move out of its way as it chooses the blue bottle and reaches the fastest (the human changes its mind from the blue bottle to the yellow mug at $t=1.2s$). Marginal-RA waits until the human is out of its way and chooses the yellow mug. Robust-RA stays cautious even as the human is moving towards a different goal and finishes last.
  • Figure 4: Closed-loop Completion Times. Histogram of completion times for all methods interacting with the influenceable human model. SLIDE has the highest frequency of short trajectories, while SSA and Robust-RA have the highest incidence of timing out.
  • Figure 5: Conditional Behavior Predictions. Most-likely mode of SLIDE CBP model given different future robot plans. Each robot plan has a corresponding human prediction in the same color. The prediction is highly dependent on the robot's plan and captures the idea that the human will change goals to a different semantic class.
  • ...and 1 more figures