Distributed Learning in Markovian Restless Bandits over Interference Graphs for Stable Spectrum Sharing
Liad Lea Didi, Kobi Cohen
TL;DR
This work tackles distributed spectrum access in interference graphs with multiple cells and unknown restless Markov channels. It introduces SMILE, a three-phase algorithm (Exploration, Allocation, Exploitation) that learns channel statistics and computes a generalized Gale–Shapley stable allocation in a fully distributed fashion. Theoretical analysis proves logarithmic regret relative to an oracle with full knowledge of expected utilities, and simulations confirm rapid convergence and strong performance against baselines. The model and results advance scalable, interference-aware spectrum sharing in cognitive networks where channel dynamics are heterogeneous and unknown.
Abstract
We study distributed learning for spectrum access and sharing among multiple cognitive communication entities, such as cells, subnetworks, or cognitive radio users (collectively referred to as cells), in communication-constrained wireless networks modeled by interference graphs. Our goal is to achieve a globally stable and interference-aware channel allocation. Stability is defined through a generalized Gale-Shapley multi-to-one matching, a well-established solution concept in wireless resource allocation. We consider wireless networks where L cells share S orthogonal channels and cannot simultaneously use the same channel as their neighbors. Each channel evolves as an unknown restless Markov process with cell-dependent rewards, making this the first work to establish global Gale-Shapley stability for channel allocation in a stochastic, temporally varying restless environment. To address this challenge, we develop SMILE (Stable Multi-matching with Interference-aware LEarning), a communication-efficient distributed learning algorithm that integrates restless bandit learning with graph-constrained coordination. SMILE enables cells to distributedly balance exploration of unknown channels with exploitation of learned information. We prove that SMILE converges to the optimal stable allocation and achieves logarithmic regret relative to a genie with full knowledge of expected utilities. Simulations validate the theoretical guarantees and demonstrate SMILE's robustness, scalability, and efficiency across diverse spectrum-sharing scenarios.
