Communication-Efficient Decentralized Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control
Dong Chen, Kaixiang Zhang, Yongqiang Wang, Xunyuan Yin, Zhaojian Li, Dimitar Filev
TL;DR
This work addresses scalable cooperative adaptive cruise control (CACC) by formulating it as a fully decentralized multi-agent reinforcement learning problem, eliminating the need for a centralized trainer or controller. It introduces MACACC, a decentralized policy-gradient framework where agents share a neighborhood critic via a consensus-like update, augmented by a quantization-based communication protocol (QMACACC) to dramatically reduce bandwidth. Empirical results in two CACC scenarios show that MACACC outperforms state-of-the-art MARL baselines in control efficacy and safety, with QMACACC offering substantial bandwidth savings at modest performance cost. The appendix demonstrates the framework's potential applicability to broader intelligent transportation systems, such as adaptive traffic signal control, highlighting the practical impact of decentralized learning with communication-efficient exchange.
Abstract
Connected and autonomous vehicles (CAVs) promise next-gen transportation systems with enhanced safety, energy efficiency, and sustainability. One typical control strategy for CAVs is the so-called cooperative adaptive cruise control (CACC) where vehicles drive in platoons and cooperate to achieve safe and efficient transportation. In this study, we formulate CACC as a multi-agent reinforcement learning (MARL) problem. Diverging from existing MARL methods that use centralized training and decentralized execution which require not only a centralized communication mechanism but also dense inter-agent communication during training and online adaptation, we propose a fully decentralized MARL framework for enhanced efficiency and scalability. In addition, a quantization-based communication scheme is proposed to reduce the communication overhead without significantly degrading the control performance. This is achieved by employing randomized rounding numbers to quantize each piece of communicated information and only communicating non-zero components after quantization. Extensive experimentation in two distinct CACC settings reveals that the proposed MARL framework consistently achieves superior performance over several contemporary benchmarks in terms of both communication efficiency and control efficacy. In the appendix, we show that our proposed framework's applicability extends beyond CACC, showing promise for broader intelligent transportation systems with intricate action and state spaces.
