Table of Contents
Fetching ...

Adaptively Coordinating with Novel Partners via Learned Latent Strategies

Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Michael Lewis, Katia Sycara, Simon Stepputtis

TL;DR

A strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time in a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space.

Abstract

Adaptation is the cornerstone of effective collaboration among heterogeneous team members. In human-agent teams, artificial agents need to adapt to their human partners in real time, as individuals often have unique preferences and policies that may change dynamically throughout interactions. This becomes particularly challenging in tasks with time pressure and complex strategic spaces, where identifying partner behaviors and selecting suitable responses is difficult. In this work, we introduce a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time. Our approach encodes strategies with a variational autoencoder to learn a latent strategy space from agent trajectory data, identifies distinct strategy types through clustering, and trains a cooperator agent conditioned on these clusters by generating partners of each strategy type. For online adaptation to novel partners, we leverage a fixed-share regret minimization algorithm that dynamically infers and adjusts the partner's strategy estimation during interaction. We evaluate our method in a modified version of the Overcooked domain, a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space. Through these experiments and an online user study, we demonstrate that our proposed agent achieves state of the art performance compared to existing baselines when paired with novel human, and agent teammates.

Adaptively Coordinating with Novel Partners via Learned Latent Strategies

TL;DR

A strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time in a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space.

Abstract

Adaptation is the cornerstone of effective collaboration among heterogeneous team members. In human-agent teams, artificial agents need to adapt to their human partners in real time, as individuals often have unique preferences and policies that may change dynamically throughout interactions. This becomes particularly challenging in tasks with time pressure and complex strategic spaces, where identifying partner behaviors and selecting suitable responses is difficult. In this work, we introduce a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time. Our approach encodes strategies with a variational autoencoder to learn a latent strategy space from agent trajectory data, identifies distinct strategy types through clustering, and trains a cooperator agent conditioned on these clusters by generating partners of each strategy type. For online adaptation to novel partners, we leverage a fixed-share regret minimization algorithm that dynamically infers and adjusts the partner's strategy estimation during interaction. We evaluate our method in a modified version of the Overcooked domain, a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space. Through these experiments and an online user study, we demonstrate that our proposed agent achieves state of the art performance compared to existing baselines when paired with novel human, and agent teammates.

Paper Structure

This paper contains 29 sections, 5 equations, 12 figures, 6 tables, 2 algorithms.

Figures (12)

  • Figure 1: Overview of TALENTS: Provided an observation of a teammate, the VAE's latent strategy clusters are queried to generate action predictions. At subsequent timesteps, the teammate's actual actions are compared to these predictions, and the belief over the teammate's latent strategy is progressively updated.
  • Figure 2: The four Overcooked layouts used in experiments.
  • Figure 3: Evaluation performance during training for each layout, averaged across agents trained with FCP, MEP, and BP populations.
  • Figure 4: Accumulated reward with a partner policy swap midway through the episode ($t=1200$). Error bars are one standard error from the mean.
  • Figure 5: Human-agent teamwork evaluation comprises team scores and participants' subjective ratings of agent teammates. Perceived workload (shaded) is lower if better. Statistically significant differences between agents are marked by asterisks. Error bars are one standard error from the mean.
  • ...and 7 more figures