A Deep Reinforcement Learning Approach for Adaptive Traffic Routing in Next-gen Networks
Akshita Abrol, Purnima Murali Mohan, Tram Truong-Huu
TL;DR
This work addresses adaptive traffic routing in next-generation, SDN-enabled networks by formulating routing as an MDP that combines link and node state information. It introduces a Deep Graph Convolutional Neural Network (DGCNN) integrated into a Deep Reinforcement Learning (DRL) framework to learn routing decisions from a rich Network State Matrix that encodes topology and traffic features. The approach uses a prudent reward structure and a prioritized experience replay with a target network, yielding rapid adaptation and improved performance over OSPF and a non-graph-based DRL baseline. Empirical results on NSFNET and a random topology show up to $7.8 ext{%}$ throughput gains and $16.1 ext{%}$ delay reductions, demonstrating the method's effectiveness and practical potential for autonomous, adaptive routing in modern networks.
Abstract
Next-gen networks require significant evolution of management to enable automation and adaptively adjust network configuration based on traffic dynamics. The advent of software-defined networking (SDN) and programmable switches enables flexibility and programmability. However, traditional techniques that decide traffic policies are usually based on hand-crafted programming optimization and heuristic algorithms. These techniques make non-realistic assumptions, e.g., considering static network load and topology, to obtain tractable solutions, which are inadequate for next-gen networks. In this paper, we design and develop a deep reinforcement learning (DRL) approach for adaptive traffic routing. We design a deep graph convolutional neural network (DGCNN) integrated into the DRL framework to learn the traffic behavior from not only the network topology but also link and node attributes. We adopt the Deep Q-Learning technique to train the DGCNN model in the DRL framework without the need for a labeled training dataset, enabling the framework to quickly adapt to traffic dynamics. The model leverages q-value estimates to select the routing path for every traffic flow request, balancing exploration and exploitation. We perform extensive experiments with various traffic patterns and compare the performance of the proposed approach with the Open Shortest Path First (OSPF) protocol. The experimental results show the effectiveness and adaptiveness of the proposed framework by increasing the network throughput by up to 7.8% and reducing the traffic delay by up to 16.1% compared to OSPF.
