Self-Play Ensemble Q-learning enabled Resource Allocation for Network Slicing

Shavbo Salehi; Pedro Enrique Iturria-Rivera; Medhat Elsayed; Majid Bavand; Raimundas Gaigalas; Yigit Ozcan; Melike Erol-Kantarci

Self-Play Ensemble Q-learning enabled Resource Allocation for Network Slicing

Shavbo Salehi, Pedro Enrique Iturria-Rivera, Medhat Elsayed, Majid Bavand, Raimundas Gaigalas, Yigit Ozcan, Melike Erol-Kantarci

TL;DR

This work tackles resource allocation in network slicing by proposing self-play ensemble Q-learning, which maintains multiple Q-tables and uses majority voting along with self-improvement against its past versions to mitigate overestimation and slow learning. The method is evaluated against Q-learning and Double Q-learning in a two-slice (eMBB and URLLC) MEC-enabled 5G setting, demonstrating improvements of 21.92% in latency, 24.22% in throughput, and 23.63% in PDR. The approach also shows robustness to adversarial manipulation by leveraging cross-checks across Q-tables and historical policies. Overall, the proposed method offers scalable, adaptive, and adversarially robust resource allocation for dynamic network slicing scenarios with practical relevance to 5G and beyond.

Abstract

In 5G networks, network slicing has emerged as a pivotal paradigm to address diverse user demands and service requirements. To meet the requirements, reinforcement learning (RL) algorithms have been utilized widely, but this method has the problem of overestimation and exploration-exploitation trade-offs. To tackle these problems, this paper explores the application of self-play ensemble Q-learning, an extended version of the RL-based technique. Self-play ensemble Q-learning utilizes multiple Q-tables with various exploration-exploitation rates leading to different observations for choosing the most suitable action for each state. Moreover, through self-play, each model endeavors to enhance its performance compared to its previous iterations, boosting system efficiency, and decreasing the effect of overestimation. For performance evaluation, we consider three RL-based algorithms; self-play ensemble Q-learning, double Q-learning, and Q-learning, and compare their performance under different network traffic. Through simulations, we demonstrate the effectiveness of self-play ensemble Q-learning in meeting the diverse demands within 21.92% in latency, 24.22% in throughput, and 23.63\% in packet drop rate in comparison with the baseline methods. Furthermore, we evaluate the robustness of self-play ensemble Q-learning and double Q-learning in situations where one of the Q-tables is affected by a malicious user. Our results depicted that the self-play ensemble Q-learning method is more robust against adversarial users and prevents a noticeable drop in system performance, mitigating the impact of users manipulating policies.

Self-Play Ensemble Q-learning enabled Resource Allocation for Network Slicing

TL;DR

Abstract

Self-Play Ensemble Q-learning enabled Resource Allocation for Network Slicing

Authors

TL;DR

Abstract

Table of Contents

Figures (5)