Table of Contents
Fetching ...

A Model for Multi-Agent Heterogeneous Interaction Problems

Christopher D. Hsu, Mulugeta A. Haile, Pratik Chaudhari

TL;DR

A model for multi-agent interaction problems to understand how a heterogeneous team of agents should organize its resources to tackle a heterogeneous team of attackers and shows how the defender team can optimally counteract a heterogeneous attacker team using very few types of defender agents, and thereby minimize its resources.

Abstract

We introduce a model for multi-agent interaction problems to understand how a heterogeneous team of agents should organize its resources to tackle a heterogeneous team of attackers. This model is inspired by how the human immune system tackles a diverse set of pathogens. The key property of this model is a ``cross-reactivity'' kernel which enables a particular defender type to respond strongly to some attacker types but weakly to a few different types of attackers. We show how due to such cross-reactivity, the defender team can optimally counteract a heterogeneous attacker team using very few types of defender agents, and thereby minimize its resources. We study this model in different settings to characterize a set of guiding principles for control problems with heterogeneous teams of agents, e.g., sensitivity of the harm to sub-optimal defender distributions, and competition between defenders gives near-optimal behavior using decentralized computation of the control. We also compare this model with existing approaches including reinforcement-learned policies, perimeter defense, and coverage control.

A Model for Multi-Agent Heterogeneous Interaction Problems

TL;DR

A model for multi-agent interaction problems to understand how a heterogeneous team of agents should organize its resources to tackle a heterogeneous team of attackers and shows how the defender team can optimally counteract a heterogeneous attacker team using very few types of defender agents, and thereby minimize its resources.

Abstract

We introduce a model for multi-agent interaction problems to understand how a heterogeneous team of agents should organize its resources to tackle a heterogeneous team of attackers. This model is inspired by how the human immune system tackles a diverse set of pathogens. The key property of this model is a ``cross-reactivity'' kernel which enables a particular defender type to respond strongly to some attacker types but weakly to a few different types of attackers. We show how due to such cross-reactivity, the defender team can optimally counteract a heterogeneous attacker team using very few types of defender agents, and thereby minimize its resources. We study this model in different settings to characterize a set of guiding principles for control problems with heterogeneous teams of agents, e.g., sensitivity of the harm to sub-optimal defender distributions, and competition between defenders gives near-optimal behavior using decentralized computation of the control. We also compare this model with existing approaches including reinforcement-learned policies, perimeter defense, and coverage control.
Paper Structure (21 sections, 8 equations, 7 figures, 2 tables)

This paper contains 21 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Orange defenders from distribution $P_d$ successfully interact with blue attackers from distribution $Q_a$ with probability $f_{d,a}$ which depends on the defender type $d$ and the attacker type $a$.
  • Figure 2: A simulation of $\sum_a N_a=100$ attackers sampled from a Gaussian $Q_a$ with $\sigma_Q=0.1$ interacts with $\sum_d N_d=100$ defenders sampled from $P_d^*$ for different values of $\sigma_P$ in a shape space $x \in [0,1]$ with $N=50$ types ($\Delta x = 0.02$). Left: For $\alpha=1$, when $\sigma_P \geq \sigma_Q\sqrt{2}$ the optimal $P_d^*$ which is a Gaussian tends towards a Dirac delta distribution at the origin. Right: In the simulation, as we increase $\sigma_P$ beyond $\sigma_Q \sqrt{2}$, the empirical harm, i.e., the average unsuccessful interactions until all attackers are recognized, decreases (blue). The number of distinct defender types also decreases (orange).
  • Figure 3: Left: Simulation of the interaction of defenders (orange) with attackers from distribution $Q_a$ (blue). The optimal defender distribution $P_d^*$ (orange) is found by optimizing \ref{['eq:harm']}. Cross-reactivity $f_{d,a}$ with bandwidth $\sigma = 0.05$ in a state $x$ with $N=200$ types ($\Delta x = 0.005$) leads to a discrete distribution. The harm $Q_a \bar{F}_a$ caused by attackers of different types (green) is uniform across the domain. Right: The harm incurred using a non-optimal $P_d$ increases as the difference measured by the Wasserstein distance between the probability $P_d$ and probability $P_d^*$ increases. To obtain this plot, we sampled 1000 different $P_d$s (by perturbing the optimal $P_d^*$ using log-normal noise) and computed the empirical and analytical harm against a fixed $Q_a$. This also indicates that the analytical harm \ref{['eq:Fbar*']} is close to the mean of the empirical harm over 100 episodes of our experiments using \ref{['eq:Fbar']} for a broad regime.
  • Figure 4: Convergence to near-optimal harm with competition dynamics. We run the population dynamics in \ref{['eq:competition']} to calculate the defender distribution $P_d(t)$ starting from a uniform $P_d(0)$. On the left, we compare the optimal defender distribution (blue) calculated using \ref{['eq:harm']} for a known $Q_a$ ($Q_a$ is the same log-normal distribution sampled in \ref{['fig:case0']}) with the defender distribution calculated using the competition dynamics (blue) and an estimated $\hat{Q}_a$ from attacker-defender interactions. On the right, we show how the empirical harm (orange batched boxplot) incurred by the competition dynamics distribution $P_d(\hat{Q}_a)$ converges towards the analytical harm (blue) and standard deviation shrinks as time progresses. For this experiment, the dynamics were run for $10^4$ iterations per episode, with time in between interactions $\Delta t = 0.2c^{-1}$, decommission rate $c=0.001$, cross-reactivity bandwidth $\sigma = 0.05$, and $N=200$ types in the shape space.
  • Figure 5: Defender distribution $P_d$ learned by SAC at the end of episode after competing for interactions with attackers sampled from $Q_a$ (the optimal $P_d^*$ is in blue and the same $Q_a$ as \ref{['fig:case0']}). We sampled $\sum_d N_d=100$ agents from a uniform distribution that shift states to perform recognition. On the right we compare the test harm of the defender distribution $P_d$ learned by SAC over training epochs to the optimal harm ($10^4$ iterations per epoch for a total of $10^6$ interactions). Cross reactivity bandwidth is $\sigma = 0.05$ and there are $N=200$ types.
  • ...and 2 more figures