Q-adaptive: A Multi-Agent Reinforcement Learning Based Routing on Dragonfly Network
Yao Kang, Xin Wang, Zhiling Lan
TL;DR
This work tackles suboptimal Dragonfly routing arising from reliance on local congestion signals by introducing Q-adaptive routing, a fully distributed multi-agent reinforcement learning scheme. It uses a novel two-level Q-table to capture source-destination and intermediate-path information, enabling efficient learning with half the memory of traditional Q-routing and guaranteeing delivery within five hops. Implemented in SST/Merlin, Q-adaptive outperforms existing adaptive routing (UGAL and PAR) and, in some cases, even VALn under adversarial traffic, achieving up to 10.5% higher throughput and up to 5× lower average latency on 1k–2.5k node Dragonfly systems, with convergence within 500 μs. The results indicate strong scalability and practical potential for MARL-based routing on high-radix interconnects, motivating future investigations into application-driven behavior and inter-job interference mitigation.
Abstract
High-radix interconnects such as Dragonfly and its variants rely on adaptive routing to balance network traffic for optimum performance. Ideally, adaptive routing attempts to forward packets between minimal and non-minimal paths with the least congestion. In practice, current adaptive routing algorithms estimate routing path congestion based on local information such as output queue occupancy. Using local information to estimate global path congestion is inevitably inaccurate because a router has no precise knowledge of link states a few hops away. This inaccuracy could lead to interconnect congestion. In this study, we present Q-adaptive routing, a multi-agent reinforcement learning routing scheme for Dragonfly systems. Q-adaptive routing enables routers to learn to route autonomously by leveraging advanced reinforcement learning technology. The proposed Q-adaptive routing is highly scalable thanks to its fully distributed nature without using any shared information between routers. Furthermore, a new two-level Q-table is designed for Q-adaptive to make it computational lightly and saves 50% of router memory usage compared with the previous Q-routing. We implement the proposed Q-adaptive routing in SST/Merlin simulator. Our evaluation results show that Q-adaptive routing achieves up to 10.5% system throughput improvement and 5.2x average packet latency reduction compared with adaptive routing algorithms. Remarkably, Q-adaptive can even outperform the optimal VALn non-minimal routing under the ADV+1 adversarial traffic pattern with up to 3% system throughput improvement and 75% average packet latency reduction.
