Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network
Shun Kotoku, Takatomo Mihana, André Röhm, Ryoichi Horisaki
TL;DR
This work tackles decentralized MARL for the competitive multi-armed bandit problem using a six-laser, cluster-synchronized photonic network. A decentralized coupling adjustment (DCA) algorithm updates inter-laser couplings based on locally observed rewards, enabling two players to avoid collisions and converge on the optimal two slots without information sharing. Numerical simulations show the system achieves three-cluster synchronization, balances exploration and exploitation, and robustly adapts across diverse reward distributions, with performance modulated by hyperparameters that govern exploitation strength and coupling bounds. The results highlight the potential of photonics-based decision-making for edge devices and outline paths to scalability and time-varying environments.
Abstract
Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.
