Structured Imitation Learning of Interactive Policies through Inverse Games
Max M. Sun, Todd Murphey
TL;DR
The paper tackles imitation learning for interactive, multi-agent policies in shared spaces by introducing a two-stage framework that first fits non-interactive policies via standard generative imitation learning and then recovers inter-agent dependencies through an inverse-game formulation. Interactive policies are defined as the Nash equilibrium of a game with a learnable joint loss $l_{\gamma}$, enabling differentiable backpropagation through the equilibrium solver. In a 5-agent social navigation benchmark, the approach achieves ground-truth-like performance with only $50$ demonstrations, substantially improving the non-interactive baseline and demonstrating data efficiency for interactive coordination. The method is modular and compatible with a range of single-agent imitation models, offering a practical path toward scalable, interactive human-robot collaboration in shared environments.
Abstract
Generative model-based imitation learning methods have recently achieved strong results in learning high-complexity motor skills from human demonstrations. However, imitation learning of interactive policies that coordinate with humans in shared spaces without explicit communication remains challenging, due to the significantly higher behavioral complexity in multi-agent interactions compared to non-interactive tasks. In this work, we introduce a structured imitation learning framework for interactive policies by combining generative single-agent policy learning with a flexible yet expressive game-theoretic structure. Our method explicitly separates learning into two steps: first, we learn individual behavioral patterns from multi-agent demonstrations using standard imitation learning; then, we structurally learn inter-agent dependencies by solving an inverse game problem. Preliminary results in a synthetic 5-agent social navigation task show that our method significantly improves non-interactive policies and performs comparably to the ground truth interactive policy using only 50 demonstrations. These results highlight the potential of structured imitation learning in interactive settings.
