Table of Contents
Fetching ...

GRLinQ: An Intelligent Spectrum Sharing Mechanism for Device-to-Device Communications with Graph Reinforcement Learning

Zhiwei Shan, Xinping Yi, Le Liang, Chung-Shou Liao, Shi Jin

TL;DR

GRLinQ reframes D2D spectrum sharing as a graph reinforcement learning problem to jointly address link scheduling and power control with reduced CSI reliance. By encoding the network with a $K$-nearest interference graph and using a hybrid graph neural network that injects model-based insights, GRLinQ achieves near-state-of-the-art performance while improving scalability and generalization to unseen networks. The framework uses PPO to optimize an RL policy over iterations, enabling distributed-like operation and data-efficient training without requiring solved-instance labels. Experiments show GRLinQ and GRLinQ-pc outperform many baselines across varied network sizes and densities, with CSI input (when available) enhancing performance on realistic channels. Overall, GRLinQ offers a promising, scalable approach for practical D2D spectrum sharing with strong transferability across configurations.

Abstract

Device-to-device (D2D) spectrum sharing in wireless communications is a challenging non-convex combinatorial optimization problem, involving entangled link scheduling and power control in a large-scale network. The state-of-the-art methods, either from a model-based or a data-driven perspective, exhibit certain limitations such as the critical need for channel state information (CSI) and/or a large number of (solved) instances (e.g., network layouts) as training samples. To advance this line of research, we propose a novel hybrid model/datadriven spectrum sharing mechanism with graph reinforcement learning for link scheduling (GRLinQ), injecting information theoretical insights into machine learning models, in such a way that link scheduling and power control can be solved in an intelligent yet explainable manner. Through an extensive set of experiments, GRLinQ demonstrates superior performance to the existing model-based and data-driven link scheduling and/or power control methods, with a relaxed requirement for CSI, a substantially reduced number of unsolved instances as training samples, a possible distributed deployment, reduced online/offline computational complexity, and more remarkably excellent scalability and generalizability over different network scenarios and system configurations.

GRLinQ: An Intelligent Spectrum Sharing Mechanism for Device-to-Device Communications with Graph Reinforcement Learning

TL;DR

GRLinQ reframes D2D spectrum sharing as a graph reinforcement learning problem to jointly address link scheduling and power control with reduced CSI reliance. By encoding the network with a -nearest interference graph and using a hybrid graph neural network that injects model-based insights, GRLinQ achieves near-state-of-the-art performance while improving scalability and generalization to unseen networks. The framework uses PPO to optimize an RL policy over iterations, enabling distributed-like operation and data-efficient training without requiring solved-instance labels. Experiments show GRLinQ and GRLinQ-pc outperform many baselines across varied network sizes and densities, with CSI input (when available) enhancing performance on realistic channels. Overall, GRLinQ offers a promising, scalable approach for practical D2D spectrum sharing with strong transferability across configurations.

Abstract

Device-to-device (D2D) spectrum sharing in wireless communications is a challenging non-convex combinatorial optimization problem, involving entangled link scheduling and power control in a large-scale network. The state-of-the-art methods, either from a model-based or a data-driven perspective, exhibit certain limitations such as the critical need for channel state information (CSI) and/or a large number of (solved) instances (e.g., network layouts) as training samples. To advance this line of research, we propose a novel hybrid model/datadriven spectrum sharing mechanism with graph reinforcement learning for link scheduling (GRLinQ), injecting information theoretical insights into machine learning models, in such a way that link scheduling and power control can be solved in an intelligent yet explainable manner. Through an extensive set of experiments, GRLinQ demonstrates superior performance to the existing model-based and data-driven link scheduling and/or power control methods, with a relaxed requirement for CSI, a substantially reduced number of unsolved instances as training samples, a possible distributed deployment, reduced online/offline computational complexity, and more remarkably excellent scalability and generalizability over different network scenarios and system configurations.
Paper Structure (28 sections, 5 equations, 8 figures, 16 tables)

This paper contains 28 sections, 5 equations, 8 figures, 16 tables.

Figures (8)

  • Figure 1: A D2D network with 5 links. (a) topology graph, and (b) the corresponding $K$-nearest Interference graph with $K=2$.
  • Figure 2: Illustration of the proposed framework. All D2D pairs are initially set to a pending state. During each iteration, the policy network makes decisions based on the current state. The policy network has considerable flexibility to classify any number of D2D pairs as active, inactive, or to retain them in the pending state. The model terminates once all D2D pairs have exited the pending state.
  • Figure 3: Policy network architectures: The architecture of our proposed MPGNN policy network. Edge features are first updated by an EdgeUpdate block. Following the edge update, node features are combined with edge features and then updated by the NodeUpdate block. These processes repeat for $L$ layers, and the final layer produces action distributions using the updated node features. EdgeUp block: This block is a multi-layer perceptron (MLP) that updates the edge features. This transformation is applied at both the initial stage and during subsequent layers. NodeUp block: This block aggregates messages from neighboring nodes and then applies an MLP to the concatenated node features and aggregated messages, resulting in updated node features.
  • Figure 4: CDF of sum rate ratios for GRLinQ, GELinQ, ITLinQ+, and FlashLinQ. The number of D2D links is set to be 50.
  • Figure 5: Average sum rate ratios achieved by different power control approaches, GRLinQ-pc, FPLinQ-pc, WMMSE, PCGNN, UWMMSE.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1: $K$-nearest Interference Graph