Table of Contents
Fetching ...

Structured Imitation Learning of Interactive Policies through Inverse Games

Max M. Sun, Todd Murphey

TL;DR

The paper tackles imitation learning for interactive, multi-agent policies in shared spaces by introducing a two-stage framework that first fits non-interactive policies via standard generative imitation learning and then recovers inter-agent dependencies through an inverse-game formulation. Interactive policies are defined as the Nash equilibrium of a game with a learnable joint loss $l_{\gamma}$, enabling differentiable backpropagation through the equilibrium solver. In a 5-agent social navigation benchmark, the approach achieves ground-truth-like performance with only $50$ demonstrations, substantially improving the non-interactive baseline and demonstrating data efficiency for interactive coordination. The method is modular and compatible with a range of single-agent imitation models, offering a practical path toward scalable, interactive human-robot collaboration in shared environments.

Abstract

Generative model-based imitation learning methods have recently achieved strong results in learning high-complexity motor skills from human demonstrations. However, imitation learning of interactive policies that coordinate with humans in shared spaces without explicit communication remains challenging, due to the significantly higher behavioral complexity in multi-agent interactions compared to non-interactive tasks. In this work, we introduce a structured imitation learning framework for interactive policies by combining generative single-agent policy learning with a flexible yet expressive game-theoretic structure. Our method explicitly separates learning into two steps: first, we learn individual behavioral patterns from multi-agent demonstrations using standard imitation learning; then, we structurally learn inter-agent dependencies by solving an inverse game problem. Preliminary results in a synthetic 5-agent social navigation task show that our method significantly improves non-interactive policies and performs comparably to the ground truth interactive policy using only 50 demonstrations. These results highlight the potential of structured imitation learning in interactive settings.

Structured Imitation Learning of Interactive Policies through Inverse Games

TL;DR

The paper tackles imitation learning for interactive, multi-agent policies in shared spaces by introducing a two-stage framework that first fits non-interactive policies via standard generative imitation learning and then recovers inter-agent dependencies through an inverse-game formulation. Interactive policies are defined as the Nash equilibrium of a game with a learnable joint loss , enabling differentiable backpropagation through the equilibrium solver. In a 5-agent social navigation benchmark, the approach achieves ground-truth-like performance with only demonstrations, substantially improving the non-interactive baseline and demonstrating data efficiency for interactive coordination. The method is modular and compatible with a range of single-agent imitation models, offering a practical path toward scalable, interactive human-robot collaboration in shared environments.

Abstract

Generative model-based imitation learning methods have recently achieved strong results in learning high-complexity motor skills from human demonstrations. However, imitation learning of interactive policies that coordinate with humans in shared spaces without explicit communication remains challenging, due to the significantly higher behavioral complexity in multi-agent interactions compared to non-interactive tasks. In this work, we introduce a structured imitation learning framework for interactive policies by combining generative single-agent policy learning with a flexible yet expressive game-theoretic structure. Our method explicitly separates learning into two steps: first, we learn individual behavioral patterns from multi-agent demonstrations using standard imitation learning; then, we structurally learn inter-agent dependencies by solving an inverse game problem. Preliminary results in a synthetic 5-agent social navigation task show that our method significantly improves non-interactive policies and performs comparably to the ground truth interactive policy using only 50 demonstrations. These results highlight the potential of structured imitation learning in interactive settings.

Paper Structure

This paper contains 10 sections, 7 equations, 4 figures.

Figures (4)

  • Figure 1: For example, social navigation requires the robot to not only plan for itself but also anticipate the actions of surrounding humans to effectively coordinate with them.
  • Figure 2: Overview of the structured imitation learning framework. Given a multi-agent demonstration dataset (dark lines indicate demonstrated actions), we first learn the non-interactive policies using standard single-agent imitation learning methods based on generative models. The interactive policies are the Nash equilibrium of a game-theoretic optimization problem based on the non-interactive policies. The cost function of the game-theoretic problem is modeled as a neural network and optimized based on the MLE formula (\ref{['eq:game_mle']}).
  • Figure 3: Qualitative results from the social navigation benchmark, where the letter "R" indicates the robot and the cross indicates the navigation goal of an agent. Learning from only 50 demonstrations, the proposed interactive policy significantly improves the safety performance of the non-interactive policy without compromising the efficiency, while performing comparably to the ground-truth policy.
  • Figure 4: Quantitative results of the social navigation benchmark (median, quartiles, and distribution of the metrics). The proposed interactive policy has comparable performance with the ground-truth policy and outperforms the non-interactive policy.

Theorems & Definitions (4)

  • Definition 1: Interactive policy
  • Definition 2: Non-interactive policy
  • Definition 3: Interaction game
  • Definition 4: Nash equilibrium