GRLinQ: An Intelligent Spectrum Sharing Mechanism for Device-to-Device Communications with Graph Reinforcement Learning
Zhiwei Shan, Xinping Yi, Le Liang, Chung-Shou Liao, Shi Jin
TL;DR
GRLinQ reframes D2D spectrum sharing as a graph reinforcement learning problem to jointly address link scheduling and power control with reduced CSI reliance. By encoding the network with a $K$-nearest interference graph and using a hybrid graph neural network that injects model-based insights, GRLinQ achieves near-state-of-the-art performance while improving scalability and generalization to unseen networks. The framework uses PPO to optimize an RL policy over iterations, enabling distributed-like operation and data-efficient training without requiring solved-instance labels. Experiments show GRLinQ and GRLinQ-pc outperform many baselines across varied network sizes and densities, with CSI input (when available) enhancing performance on realistic channels. Overall, GRLinQ offers a promising, scalable approach for practical D2D spectrum sharing with strong transferability across configurations.
Abstract
Device-to-device (D2D) spectrum sharing in wireless communications is a challenging non-convex combinatorial optimization problem, involving entangled link scheduling and power control in a large-scale network. The state-of-the-art methods, either from a model-based or a data-driven perspective, exhibit certain limitations such as the critical need for channel state information (CSI) and/or a large number of (solved) instances (e.g., network layouts) as training samples. To advance this line of research, we propose a novel hybrid model/datadriven spectrum sharing mechanism with graph reinforcement learning for link scheduling (GRLinQ), injecting information theoretical insights into machine learning models, in such a way that link scheduling and power control can be solved in an intelligent yet explainable manner. Through an extensive set of experiments, GRLinQ demonstrates superior performance to the existing model-based and data-driven link scheduling and/or power control methods, with a relaxed requirement for CSI, a substantially reduced number of unsolved instances as training samples, a possible distributed deployment, reduced online/offline computational complexity, and more remarkably excellent scalability and generalizability over different network scenarios and system configurations.
