Table of Contents
Fetching ...

Safe Interaction via Monte Carlo Linear-Quadratic Games

Benjamin A. Christie, Dylan P. Losey

TL;DR

This work tackles safe interaction in human-robot collaboration by modeling the human as an adversary in a zero-sum game and seeking a Nash Equilibrium policy for the robot. It introduces Monte Carlo Linear-Quadratic Games (MCLQ), which seeds an initial NE-inspired trajectory from an LQ approximation and then refines it via nested Metropolis-Hastings searches to handle nonlinearities and unpredictability in real time. The approach provides a theoretical bridge between exact Hamilton-Jacobi methods and practical LQ baselines, with a tunable safety margin to balance conservatism and performance. Empirically, MCLQ achieves near-NE performance with real-time computation in simulations and demonstrates reduced collisions and enhanced perceived safety in a 24-person user study, outperforming state-of-the-art baselines in safety and responsiveness.

Abstract

Safety is critical during human-robot interaction. But -- because people are inherently unpredictable -- it is often difficult for robots to plan safe behaviors. Instead of relying on our ability to anticipate humans, here we identify robot policies that are robust to unexpected human decisions. We achieve this by formulating human-robot interaction as a zero-sum game, where (in the worst case) the human's actions directly conflict with the robot's objective. Solving for the Nash Equilibrium of this game provides robot policies that maximize safety and performance across a wide range of human actions. Existing approaches attempt to find these optimal policies by leveraging Hamilton-Jacobi analysis (which is intractable) or linear-quadratic approximations (which are inexact). By contrast, in this work we propose a computationally efficient and theoretically justified method that converges towards the Nash Equilibrium policy. Our approach (which we call MCLQ) leverages linear-quadratic games to obtain an initial guess at safe robot behavior, and then iteratively refines that guess with a Monte Carlo search. Not only does MCLQ provide real-time safety adjustments, but it also enables the designer to tune how conservative the robot is -- preventing the system from focusing on unrealistic human behaviors. Our simulations and user study suggest that this approach advances safety in terms of both computation time and expected performance. See videos of our experiments here: https://youtu.be/KJuHeiWVuWY.

Safe Interaction via Monte Carlo Linear-Quadratic Games

TL;DR

This work tackles safe interaction in human-robot collaboration by modeling the human as an adversary in a zero-sum game and seeking a Nash Equilibrium policy for the robot. It introduces Monte Carlo Linear-Quadratic Games (MCLQ), which seeds an initial NE-inspired trajectory from an LQ approximation and then refines it via nested Metropolis-Hastings searches to handle nonlinearities and unpredictability in real time. The approach provides a theoretical bridge between exact Hamilton-Jacobi methods and practical LQ baselines, with a tunable safety margin to balance conservatism and performance. Empirically, MCLQ achieves near-NE performance with real-time computation in simulations and demonstrates reduced collisions and enhanced perceived safety in a 24-person user study, outperforming state-of-the-art baselines in safety and responsiveness.

Abstract

Safety is critical during human-robot interaction. But -- because people are inherently unpredictable -- it is often difficult for robots to plan safe behaviors. Instead of relying on our ability to anticipate humans, here we identify robot policies that are robust to unexpected human decisions. We achieve this by formulating human-robot interaction as a zero-sum game, where (in the worst case) the human's actions directly conflict with the robot's objective. Solving for the Nash Equilibrium of this game provides robot policies that maximize safety and performance across a wide range of human actions. Existing approaches attempt to find these optimal policies by leveraging Hamilton-Jacobi analysis (which is intractable) or linear-quadratic approximations (which are inexact). By contrast, in this work we propose a computationally efficient and theoretically justified method that converges towards the Nash Equilibrium policy. Our approach (which we call MCLQ) leverages linear-quadratic games to obtain an initial guess at safe robot behavior, and then iteratively refines that guess with a Monte Carlo search. Not only does MCLQ provide real-time safety adjustments, but it also enables the designer to tune how conservative the robot is -- preventing the system from focusing on unrealistic human behaviors. Our simulations and user study suggest that this approach advances safety in terms of both computation time and expected performance. See videos of our experiments here: https://youtu.be/KJuHeiWVuWY.

Paper Structure

This paper contains 11 sections, 18 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Human and drone moving in a shared workspace. Under our proposed MCLQ safety filter, the drone reasons about worst case human actions within designer-specified bounds, and then selects robust behaviors. For instance, here the drone moves across the table to better prevent a collision.
  • Figure 2: Simulation results across point-mass, driving, and manipulator environments. (Left) We plot the cost and (Right) computation time averaged over $100$ simulations. Computation time is the number of milliseconds per robot action (normalized by the number of timesteps per trajectory). In non-LQ settings the computation time for NE is prohibitively high; e.g., in driving the NE computation time exceeded one hour. We could not calculate NE in the $26$-dimensional manipulator environment. Error bars show standard deviation and an $*$ denotes statistical significance.
  • Figure 3: Simulation results for a modified point-mass environment where we adjust the safety margin $\lambda$ in MCLQ. Increasing $\lambda$ causes the MCLQ robot to consider a wider range of worst case human actions, resulting in more conservative behavior. Conversely, decreasing $\lambda$ causes the MCLQ robot to increasingly rely on its nominal human model. Unlike LQ approximations, our proposed method gives designers the flexibility to tune $\lambda$ and adjust the safety margin.
  • Figure 4: Results from our user study in Section \ref{['sec:us']}. Participants walked around a room to assemble a tower; a drone completed revolutions around the same workspace to monitor the human's progress (also see Figure \ref{['fig:front']}). (Left) The average number of revolutions the drone completed and the average number of collisions. Here "collisions" occurred when the drone was within $0.5$ meters of the human. The proposed MCLQ algorithm adjusts the robot's behavior to increase safety (fewer collisions) while also enhancing performance (more revolutions). (Right) After interacting with each algorithm participants answered survey questions about how safe, predictable, and attentive the robot was. Ratings suggest that participants perceived MCLQ to be a safer system. Error bars show standard deviation and an $*$ denotes statistical significance ($p < .05$).