Table of Contents
Fetching ...

Collaborative AI Teaming in Unknown Environments via Active Goal Deduction

Zuyuan Zhang, Hanhan Zhou, Mahdi Imani, Taeyoung Lee, Tian Lan

TL;DR

This work proposes teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction and utilizes pre-trained, goal-conditioned policies to enable zero-shot policy adaptation.

Abstract

With the advancements of artificial intelligence (AI), we're seeing more scenarios that require AI to work closely with other agents, whose goals and strategies might not be known beforehand. However, existing approaches for training collaborative agents often require defined and known reward signals and cannot address the problem of teaming with unknown agents that often have latent objectives/rewards. In response to this challenge, we propose teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction and utilizes pre-trained, goal-conditioned policies to enable zero-shot policy adaptation. We prove that unbiased reward estimates in our framework are sufficient for optimal teaming with unknown agents. We further evaluate the framework of redesigned multi-agent particle and StarCraft II micromanagement environments with diverse unknown agents of different behaviors/rewards. Empirical results demonstrate that our framework significantly advances the teaming performance of AI and unknown agents in a wide range of collaborative scenarios.

Collaborative AI Teaming in Unknown Environments via Active Goal Deduction

TL;DR

This work proposes teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction and utilizes pre-trained, goal-conditioned policies to enable zero-shot policy adaptation.

Abstract

With the advancements of artificial intelligence (AI), we're seeing more scenarios that require AI to work closely with other agents, whose goals and strategies might not be known beforehand. However, existing approaches for training collaborative agents often require defined and known reward signals and cannot address the problem of teaming with unknown agents that often have latent objectives/rewards. In response to this challenge, we propose teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction and utilizes pre-trained, goal-conditioned policies to enable zero-shot policy adaptation. We prove that unbiased reward estimates in our framework are sufficient for optimal teaming with unknown agents. We further evaluate the framework of redesigned multi-agent particle and StarCraft II micromanagement environments with diverse unknown agents of different behaviors/rewards. Empirical results demonstrate that our framework significantly advances the teaming performance of AI and unknown agents in a wide range of collaborative scenarios.
Paper Structure (24 sections, 6 theorems, 21 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 21 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Lemma 3.1

The estimated posterior of the unknown agent reward is given by

Figures (10)

  • Figure 1: We consider the problem of enabling synergistic teaming of AI agents with other unknown agents (e.g., human or autonomous agents that could have latent rewards/objectives) in collaborative task environments.
  • Figure 2: An illustration of our proposed framework. STUN agents $\pi_i(\cdot|R)$ are pre-trained using surrogate agent models with sampled latent rewards $R$. To collaborate with unknown agents, they use inverse learning (step b) on the observed trajectories $\{\tau^u_i\}$ of the unknown agents (step a) and perform a zero-shot policy adaptation based on an unbiased estimate $\hat{R}$ (step c).
  • Figure 3: (a) Illustration of the underlying reward tradeoff when STUN agents team up with unknown agents ranging from playing safe to being greedy. (b) Ability of STUN agents to quickly reasoning/infering the time-varying reward of the unknown agents (changing every 20 epochs) and then performing zero-shot policy adaptation on the fly. (c) Ablation studies showing the impact of different design modules, as well as robust performance of STUN under unknown reward function with increasing complexity (e.g., increasing from 2 to 6 dimensions of reward components and using nonlinear mixing functions).
  • Figure 4: Performance comparison of our proposed STUN agents and selected baselines on redesigned SMAC tasks.
  • Figure 5: An illustration of the correlation between the posterior estimate of reward parameters (shown in each row) using KD-BIL and the ground-truth reward parameters. Our proposed active goal inference can accurately infer the latent reward from observed unknown agent trajectories.
  • ...and 5 more figures

Theorems & Definitions (12)

  • Lemma 3.1
  • Theorem 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 3.5
  • proof
  • Lemma B.1
  • proof
  • proof
  • proof
  • ...and 2 more