Table of Contents
Fetching ...

PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork

Hohei Chan, Xinzhi Zhang, Antao Xiang, Weinan Zhang, Mengchen Zhao

TL;DR

PADiff addresses ad hoc teamwork by enabling an ego agent to anticipate and adapt to unseen teammates using a diffusion-based policy with $K$ denoising steps. It couples a state-conditioned AFM-Net for real-time adaptation with a Predictive Guidance Block (PGB) that injects teammate-aware objectives during training, yielding multimodal and robust cooperation. Empirical results across Predator-Prey, Level-Based Foraging, and Overcooked show PADiff consistently outperforming strong baselines and exhibiting clear multimodal policy distributions, validated by cross-play analyses and ablations. The work advances open-world multi-agent collaboration by integrating predictive signals with adaptive diffusion policies to handle non-stationary teammate behavior.

Abstract

Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications. The core challenge of AHT is to develop an ego agent that can predict and adapt to unknown teammates on the fly. Conventional RL-based approaches optimize a single expected return, which often causes policies to collapse into a single dominant behavior, thus failing to capture the multimodal cooperation patterns inherent in AHT. In this work, we introduce PADiff, a diffusion-based approach that captures agent's multimodal behaviors, unlocking its diverse cooperation modes with teammates. However, standard diffusion models lack the ability to predict and adapt in highly non-stationary AHT scenarios. To address this limitation, we propose a novel diffusion-based policy that integrates critical predictive information about teammates into the denoising process. Extensive experiments across three cooperation environments demonstrate that PADiff outperforms existing AHT methods significantly.

PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork

TL;DR

PADiff addresses ad hoc teamwork by enabling an ego agent to anticipate and adapt to unseen teammates using a diffusion-based policy with denoising steps. It couples a state-conditioned AFM-Net for real-time adaptation with a Predictive Guidance Block (PGB) that injects teammate-aware objectives during training, yielding multimodal and robust cooperation. Empirical results across Predator-Prey, Level-Based Foraging, and Overcooked show PADiff consistently outperforming strong baselines and exhibiting clear multimodal policy distributions, validated by cross-play analyses and ablations. The work advances open-world multi-agent collaboration by integrating predictive signals with adaptive diffusion policies to handle non-stationary teammate behavior.

Abstract

Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications. The core challenge of AHT is to develop an ego agent that can predict and adapt to unknown teammates on the fly. Conventional RL-based approaches optimize a single expected return, which often causes policies to collapse into a single dominant behavior, thus failing to capture the multimodal cooperation patterns inherent in AHT. In this work, we introduce PADiff, a diffusion-based approach that captures agent's multimodal behaviors, unlocking its diverse cooperation modes with teammates. However, standard diffusion models lack the ability to predict and adapt in highly non-stationary AHT scenarios. To address this limitation, we propose a novel diffusion-based policy that integrates critical predictive information about teammates into the denoising process. Extensive experiments across three cooperation environments demonstrate that PADiff outperforms existing AHT methods significantly.

Paper Structure

This paper contains 21 sections, 14 equations, 12 figures, 1 table, 2 algorithms.

Figures (12)

  • Figure 1: An example illustrating multiple cooperation patterns in AHT. Situation 1: The ego agent passes the ball to teammate A; Situation 2: The ego agent attempts a shot directly; Situation 3: The ego agent passes the ball to teammate B.
  • Figure 2: Overview of the AHT training pipeline. The ego agent learns to cooperate with diverse teammates by: (a) sampling from a heterogeneous teammate policy pool to simulate varied collaboration scenarios, (b) interacting within the environment to collect trajectories, (c) storing interaction data, and (d) continuously optimizing its policy based on the collected data.
  • Figure 3: The overall architecture of PADiff. The subgraph (a) shows that PADiff represents the policy as a diffusion-based model and the State Encoder $f_\xi(z_t | s_{t-m:t})$ transforms states into latent representation capturing team dynamics. $c_t$ represents the vector obtained by bit-wise addition of the embedding of latent representations $z_t$ and the embedding of diffusion step $k$, which serves as the condition for diffusion. The subgraph (b) showcases our AFM-Net uses $c_t$ representing teamwork context as the denoising condition to dynamically modulate the intermediate feature vectors to affect the actions generation. The subgraph (c) illustrates that PGB integrates the team-aware information into the action denoising process by predicting teammates' intentions while training, through gradient propagation, ensuring that the ego agent can make such decisions that align with long-term team objectives while testing.
  • Figure 4: Forward adding noise process and inverse denoising process in PADiff.
  • Figure 5: Average evaluation returns with 95% confidence interval during training.
  • ...and 7 more figures