Multi-agent imitation learning with function approximation: Linear Markov games and beyond

Luca Viano; Till Freihaut; Emanuele Nevali; Volkan Cevher; Matthieu Geist; Giorgia Ramponi

Multi-agent imitation learning with function approximation: Linear Markov games and beyond

Luca Viano, Till Freihaut, Emanuele Nevali, Volkan Cevher, Matthieu Geist, Giorgia Ramponi

TL;DR

This work provides the first, computationally efficient, interactive MAIL algorithm for linear Markov games and shows that its sample complexity depends only on the dimension of the feature map $d$.

Abstract

In this work, we present the first theoretical analysis of multi-agent imitation learning (MAIL) in linear Markov games where both the transition dynamics and each agent's reward function are linear in some given features. We demonstrate that by leveraging this structure, it is possible to replace the state-action level "all policy deviation concentrability coefficient" (Freihaut et al., arXiv:2510.09325) with a concentrability coefficient defined at the feature level which can be much smaller than the state-action analog when the features are informative about states' similarity. Furthermore, to circumvent the need for any concentrability coefficient, we turn to the interactive setting. We provide the first, computationally efficient, interactive MAIL algorithm for linear Markov games and show that its sample complexity depends only on the dimension of the feature map $d$. Building on these theoretical findings, we propose a deep MAIL interactive algorithm which clearly outperforms BC on games such as Tic-Tac-Toe and Connect4.

Multi-agent imitation learning with function approximation: Linear Markov games and beyond

TL;DR

This work provides the first, computationally efficient, interactive MAIL algorithm for linear Markov games and shows that its sample complexity depends only on the dimension of the feature map

Abstract

. Building on these theoretical findings, we propose a deep MAIL interactive algorithm which clearly outperforms BC on games such as Tic-Tac-Toe and Connect4.

Paper Structure (63 sections, 19 theorems, 160 equations, 2 figures, 2 tables, 3 algorithms)

This paper contains 63 sections, 19 theorems, 160 equations, 2 figures, 2 tables, 3 algorithms.

Introduction
Preliminaries
Finite Horizon two player zero-sum Markov games.
Policies, occupancy measures and value functions.
Nash Equilibria.
Linear Markov games
Linearity of state action value functions.
Imitation Learning in MGs.
Main result for non-interactive setting
Behavioral Cloning.
Rate optimality and necessity of $\mathcal{C}_{\varphi, \max}$.
Which are good features for BC?
Interactive Linear MAIL
The algorithm: LSVI-UCB-ZERO-BC
Extension to the infinite horizon setting.
...and 48 more sections

Key Result

Theorem 3.3

Main result: non-interactive case. Let the feature concentrability coefficient and the the expert features covariances be given as defined in Defintion def:feature_concentrability with $\lambda = \tau_{\textup{E}}^{-1},$ where $\tau_{\textup{E}}$ is the total number of trajectories. Furthermore, ass

Figures (2)

Figure 1: Experimental results in the linear Gridworld ((a), (b)) and deep Tic-Tac-Toe case ((c)).
Figure 2: Comparison of exploration algorithms

Theorems & Definitions (37)

Example 2.2
Definition 3.2
Theorem 3.3
Theorem 4.1
Remark 4.2
Lemma 4.2
Lemma 4.2
Lemma 4.2
Definition D.3
Lemma D.4
...and 27 more

Multi-agent imitation learning with function approximation: Linear Markov games and beyond

TL;DR

Abstract

Multi-agent imitation learning with function approximation: Linear Markov games and beyond

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (37)