Table of Contents
Fetching ...

Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning

Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius

TL;DR

A clear guideline for generating novel datasets; a standardisation of over 80 existing datasets hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and a suite of analysis tools that allow us to understand these datasets better, aiding further development.

Abstract

Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets. We then show why neglecting the nature of the data is problematic, through salient examples of how tightly algorithmic performance is coupled to the dataset used, necessitating a common foundation for experiments in the field. In response, we take a big step towards improving data usage and data awareness in offline MARL, with three key contributions: (1) a clear guideline for generating novel datasets; (2) a standardisation of over 80 existing datasets, hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and (3) a suite of analysis tools that allow us to understand these datasets better, aiding further development.

Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning

TL;DR

A clear guideline for generating novel datasets; a standardisation of over 80 existing datasets hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and a suite of analysis tools that allow us to understand these datasets better, aiding further development.

Abstract

Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets. We then show why neglecting the nature of the data is problematic, through salient examples of how tightly algorithmic performance is coupled to the dataset used, necessitating a common foundation for experiments in the field. In response, we take a big step towards improving data usage and data awareness in offline MARL, with three key contributions: (1) a clear guideline for generating novel datasets; (2) a standardisation of over 80 existing datasets, hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and (3) a suite of analysis tools that allow us to understand these datasets better, aiding further development.
Paper Structure (15 sections, 6 figures, 3 tables)

This paper contains 15 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: To demonstrate the effect that the mean episode return of a dataset has on the final performance of an offline MARL algorithm, we generated four datasets with mean episode returns given in \ref{['fig:mean-matters-means']}. We then train an offline MARL algorithm for 50k training steps on each of the datasets and compare the final performance of the algorithm across the different datasets. We repeat the experiment across three different SMAC scenarios, two different algorithms (IQL+CQLformanek2024dispelling and MAICQmaicq) and 10 random seeds. The aggregated results are given in \ref{['fig:mean-matters-results']}.
  • Figure 2: To demonstrate the surprising effect that the standard deviation (std) can have on the performance of an offline MARL experiment we generate 5 datasets that each had the same mean but differing std. We then train two offline MARL algorithms, IQL+CQL and MAICQ, on the data and report the final performance. We repeat the experiment across three different SMAC scenarios, two different algorithms, 10 random seeds and an evaluation batch size of 32. We then aggregate the results as per gorsane2022emarl.
  • Figure 3: We generate two datasets on 2-Agent Halfcheetah each with very similar episode return means and standard deviations, but distinct data distributions. We then train MADDPG+CQL on each dataset and report its performance over 1 million training steps. We repeat the experiment over 10 random seeds.
  • Figure 4: We use two subsampled datasets of the 5m_vs_6m scenario from SMACv1, with almost identical distributions, but from two different sources formanek2023ogmarlcfcql (the Medium quality in both cases). We then train IQL+CQL on each dataset and report its final performance. We repeat the experiment over 10 random seeds.
  • Figure 5: The results returned when calling descriptive_summary on the 2s3z Vault from OG-MARL formanek2023ogmarl
  • ...and 1 more figures