DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

Akash Karthikeyan; Yash Vardhan Pant

DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

Akash Karthikeyan, Yash Vardhan Pant

TL;DR

The paper tackles learning robust strategies in dynamic, continuous-action multi-agent settings by addressing non-stationarity and vulnerability to unseen opponents. It introduces DiffFP, a diffusion-based fictitious-play framework where a diffusion policy models the best response to an evolving average opponent strategy, enabling multimodal and robust behavior learned from scratch. The approach demonstrates convergence toward a $ε$-Nash equilibrium in continuous zero-sum games and shows substantial gains in convergence speed (up to $3×$) and success rates (up to $30×$) over RL baselines across racing and multi-particle environments. These results suggest diffusion-based best responses offer robust, generalizable policies for competitive multi-agent tasks with continuous actions, with significant implications for reliable AI in dynamic settings.

Abstract

Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behaviors in continuous decision spaces remains challenging. Ensuring adaptability and generalization in self-play settings is critical for achieving competitive performance in dynamic multi-agent environments. These challenges often cause methods to converge slowly or fail to converge at all to a Nash equilibrium, making agents vulnerable to strategic exploitation by unseen opponents. To address these challenges, we propose DiffFP, a fictitious play (FP) framework that estimates the best response to unseen opponents while learning a robust and multimodal behavioral policy. Specifically, we approximate the best response using a diffusion policy that leverages generative modeling to learn adaptive and diverse strategies. Through empirical evaluation, we demonstrate that the proposed FP framework converges towards $ε$-Nash equilibria in continuous- space zero-sum games. We validate our method on complex multi-agent environments, including racing and multi-particle zero-sum games. Simulation results show that the learned policies are robust against diverse opponents and outperform baseline reinforcement learning policies. Our approach achieves up to 3$\times$ faster convergence and 30$\times$ higher success rates on average against RL-based baselines, demonstrating its robustness to opponent strategies and stability across training iterations

DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

TL;DR

Abstract

DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)

Theorems & Definitions (5)