Table of Contents
Fetching ...

Communicating Unexpectedness for Out-of-Distribution Multi-Agent Reinforcement Learning

Min Whoo Lee, Kibeom Kim, Soo Wung Shin, Minsu Lee, Byoung-Tak Zhang

TL;DR

The paper tackles how to enable robust out-of-distribution adaptation in decentralized MARL by introducing Unexpectedness Encoding Scheme with Reward (UES+R). UES measures prediction errors in forward dynamics to encode environmental surprises, while a reward-driven communication channel supplies task-relevant signals; their combination yields performance approaching centralized training and superior robustness to distribution shifts in a multi-robot warehouse setting. The key contribution is a practical, decentralized communication framework that leverages both unexpectedness and extrinsic reward to cope with dynamic environments. This approach has implications for real-world multi-agent systems where centralized training is infeasible and deployment environments are non-stationary.

Abstract

Applying multi-agent reinforcement learning methods to realistic settings is challenging as it may require the agents to quickly adapt to unexpected situations that are rarely or never encountered in training. Recent methods for generalization to such out-of-distribution settings are limited to more specific, restricted instances of distribution shifts. To tackle adaptation to distribution shifts, we propose Unexpected Encoding Scheme, a novel decentralized multi-agent reinforcement learning algorithm where agents communicate "unexpectedness," the aspects of the environment that are surprising. In addition to a message yielded by the original reward-driven communication, each agent predicts the next observation based on previous experience, measures the discrepancy between the prediction and the actually encountered observation, and encodes this discrepancy as a message. Experiments on multi-robot warehouse environment support that our proposed method adapts robustly to dynamically changing training environments as well as out-of-distribution environment.

Communicating Unexpectedness for Out-of-Distribution Multi-Agent Reinforcement Learning

TL;DR

The paper tackles how to enable robust out-of-distribution adaptation in decentralized MARL by introducing Unexpectedness Encoding Scheme with Reward (UES+R). UES measures prediction errors in forward dynamics to encode environmental surprises, while a reward-driven communication channel supplies task-relevant signals; their combination yields performance approaching centralized training and superior robustness to distribution shifts in a multi-robot warehouse setting. The key contribution is a practical, decentralized communication framework that leverages both unexpectedness and extrinsic reward to cope with dynamic environments. This approach has implications for real-world multi-agent systems where centralized training is infeasible and deployment environments are non-stationary.

Abstract

Applying multi-agent reinforcement learning methods to realistic settings is challenging as it may require the agents to quickly adapt to unexpected situations that are rarely or never encountered in training. Recent methods for generalization to such out-of-distribution settings are limited to more specific, restricted instances of distribution shifts. To tackle adaptation to distribution shifts, we propose Unexpected Encoding Scheme, a novel decentralized multi-agent reinforcement learning algorithm where agents communicate "unexpectedness," the aspects of the environment that are surprising. In addition to a message yielded by the original reward-driven communication, each agent predicts the next observation based on previous experience, measures the discrepancy between the prediction and the actually encountered observation, and encodes this discrepancy as a message. Experiments on multi-robot warehouse environment support that our proposed method adapts robustly to dynamically changing training environments as well as out-of-distribution environment.
Paper Structure (20 sections, 5 equations, 3 figures, 1 table)

This paper contains 20 sections, 5 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Conceptual diagram of the problem description. The green dotted box indicates the Goal-Shift setting, and the red dotted box indicates the Shelf-Shift setting.
  • Figure 2: Overview of the Unexpectedness Encoding Scheme with Reward (UES+R).
  • Figure 3: Learning curves on training distribution. Mean and standard deviation across 5 runs are plotted. Note that M(UES+R) is our main method, and MAPPO is intended to indicate the upper bound of performance.