Multi-agent transformer-accelerated RL for satisfaction of STL specifications

Albin Larsson Forsberg; Alexandros Nikou; Aneta Vulgarakis Feljan; Jana Tumova

Multi-agent transformer-accelerated RL for satisfaction of STL specifications

Albin Larsson Forsberg, Alexandros Nikou, Aneta Vulgarakis Feljan, Jana Tumova

TL;DR

This paper proposes time-dependent multi-agent transformers which can solve the temporally dependent multi-agent problem efficiently with a centralized approach via the use of transformers that proficiently handle the large input.

Abstract

One of the main challenges in multi-agent reinforcement learning is scalability as the number of agents increases. This issue is further exacerbated if the problem considered is temporally dependent. State-of-the-art solutions today mainly follow centralized training with decentralized execution paradigm in order to handle the scalability concerns. In this paper, we propose time-dependent multi-agent transformers which can solve the temporally dependent multi-agent problem efficiently with a centralized approach via the use of transformers that proficiently handle the large input. We highlight the efficacy of this method on two problems and use tools from statistics to verify the probability that the trajectories generated under the policy satisfy the task. The experiments show that our approach has superior performance against the literature baseline algorithms in both cases.

Multi-agent transformer-accelerated RL for satisfaction of STL specifications

TL;DR

Abstract

Paper Structure (12 sections, 7 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 7 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Problem Setting
Time-dependent Multi-agent Transformer (TD-MAT)
Transformers
Method Overview
Network Structure
Training
Using TD-MAT
Experiments
Discussion
Conclusion

Figures (3)

Figure 1: An overview of how the encoding procedure of observations, $o$, is done. $i$ corresponds to the agent index and $t$ the timestep. They are color coded to show the time encoding axis along the vertical axis (lighter grey) and the agent encoding along the horizontal axis (darker grey).
Figure 2: Architectural overview of the network structure. It consists of three components (encoder, value function approximator, decoder) corresponding to the three different colors.
Figure 3: Average satisfaction over time for specifications for the experiments. The left graph shows results from problem 1 and the right one shows result from problem 2. Shaded areas correspond to the variance. A rolling average of $n = 5$ was used.

Multi-agent transformer-accelerated RL for satisfaction of STL specifications

TL;DR

Abstract

Multi-agent transformer-accelerated RL for satisfaction of STL specifications

Authors

TL;DR

Abstract

Table of Contents

Figures (3)