PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork
Hohei Chan, Xinzhi Zhang, Antao Xiang, Weinan Zhang, Mengchen Zhao
TL;DR
PADiff addresses ad hoc teamwork by enabling an ego agent to anticipate and adapt to unseen teammates using a diffusion-based policy with $K$ denoising steps. It couples a state-conditioned AFM-Net for real-time adaptation with a Predictive Guidance Block (PGB) that injects teammate-aware objectives during training, yielding multimodal and robust cooperation. Empirical results across Predator-Prey, Level-Based Foraging, and Overcooked show PADiff consistently outperforming strong baselines and exhibiting clear multimodal policy distributions, validated by cross-play analyses and ablations. The work advances open-world multi-agent collaboration by integrating predictive signals with adaptive diffusion policies to handle non-stationary teammate behavior.
Abstract
Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications. The core challenge of AHT is to develop an ego agent that can predict and adapt to unknown teammates on the fly. Conventional RL-based approaches optimize a single expected return, which often causes policies to collapse into a single dominant behavior, thus failing to capture the multimodal cooperation patterns inherent in AHT. In this work, we introduce PADiff, a diffusion-based approach that captures agent's multimodal behaviors, unlocking its diverse cooperation modes with teammates. However, standard diffusion models lack the ability to predict and adapt in highly non-stationary AHT scenarios. To address this limitation, we propose a novel diffusion-based policy that integrates critical predictive information about teammates into the denoising process. Extensive experiments across three cooperation environments demonstrate that PADiff outperforms existing AHT methods significantly.
