Table of Contents
Fetching ...

Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular Observations

Bryce Ferenczi, Michael Burke, Tom Drummond

TL;DR

A novel algorithm is proposed that alternates between cross-attention between a 2D latent state and observation, and a discounted cumulative sum over the sequence dimension to efficiently accumulate historical information in an observation space of varying size.

Abstract

Various works have aimed at combining the inference efficiency of recurrent models and training parallelism of multi-head attention for sequence modeling. However, most of these works focus on tasks with fixed-dimension observation spaces, such as individual tokens in language modeling or pixels in image completion. To handle an observation space of varying size, we propose a novel algorithm that alternates between cross-attention between a 2D latent state and observation, and a discounted cumulative sum over the sequence dimension to efficiently accumulate historical information. We find this resampling cycle is critical for performance. To evaluate efficient sequence modeling in this domain, we introduce two multi-agent intention tasks: simulated agents chasing bouncing particles and micromanagement analysis in professional StarCraft II games. Our algorithm achieves comparable accuracy with a lower parameter count, faster training and inference compared to existing methods.

Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular Observations

TL;DR

A novel algorithm is proposed that alternates between cross-attention between a 2D latent state and observation, and a discounted cumulative sum over the sequence dimension to efficiently accumulate historical information in an observation space of varying size.

Abstract

Various works have aimed at combining the inference efficiency of recurrent models and training parallelism of multi-head attention for sequence modeling. However, most of these works focus on tasks with fixed-dimension observation spaces, such as individual tokens in language modeling or pixels in image completion. To handle an observation space of varying size, we propose a novel algorithm that alternates between cross-attention between a 2D latent state and observation, and a discounted cumulative sum over the sequence dimension to efficiently accumulate historical information. We find this resampling cycle is critical for performance. To evaluate efficient sequence modeling in this domain, we introduce two multi-agent intention tasks: simulated agents chasing bouncing particles and micromanagement analysis in professional StarCraft II games. Our algorithm achieves comparable accuracy with a lower parameter count, faster training and inference compared to existing methods.

Paper Structure

This paper contains 24 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Visualisation of multi-agent environments used for benchmarking.
  • Figure 2: Encoders summarize an irregular set of tokens from the observation ${\mathbb{O}}$, to a fixed size ${\mathbb{L}}$, for the spatio-temporal encoder.
  • Figure 3: Several methods of encoding player (${\mathbb{O}}_p$) and enemy (${\mathbb{O}}_e$) observations to a fixed dimension ${\mathbb{L}}\in\mathbb{R}^{n\times d}$. Process together with an embedding to distinguish ${\mathbb{O}}_p$ from ${\mathbb{O}}_e$ (Fig. \ref{['fig:enc-fused']}), process separately (Fig. \ref{['fig:enc-piecewise']}) or process sequentially (Fig. \ref{['fig:enc-seq']}).
  • Figure 4: The inclusive-scan encoder alternates between sampling the observation based on some latent variable ${\mathbb{L}}_x$ and accumulating a weighted sum ${\mathbb{L}}'$. A learned set of parameters ${\mathbb{L}}_0$ is used as the query in the first block. $\gamma\geq1$
  • Figure 5: Average Top 1 assignment accuracy over the simulation sequence with training cost. The top left corner is ideal in each scenario. The Scan encoder linearly scales in performance and cost from $^1$ to $^2$ by introducing an additional self-attention layer in each recursion block.
  • ...and 5 more figures