Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks
Zhaoyang Liu, Xijun Wang, Chenyuan Feng, Xinghua Sun, Wen Zhan, Xiang Chen
TL;DR
This work addresses the challenge of generalizing MAC protocols across heterogeneous wireless networks by introducing Generalizable Multiple Access (GMA), a meta-reinforcement learning approach that employs a Mixture of Experts (MoE) encoder to learn task-aware representations. GMA leverages Soft Actor-Critic (SAC) to train a goal-conditioned policy and uses a MoE-enhanced encoder to generate robust task embeddings $z$, enabling rapid adaptation to unseen network configurations. The reward design balances throughput with fairness among the agent and existing nodes, controlled by a fairness factor $\nu$, and meta-training across diverse tasks enables zero-shot and few-shot generalization. Empirical results show that GMA achieves fast adaptation and high performance in new environments, with MoE providing improved representation and stability, while preserving competitiveness with environment-specific baselines in training environments and enabling fair coexistence in dynamic scenarios.
Abstract
This paper focuses on spectrum sharing in heterogeneous wireless networks, where nodes with different Media Access Control (MAC) protocols to transmit data packets to a common access point over a shared wireless channel. While previous studies have proposed Deep Reinforcement Learning (DRL)-based multiple access protocols tailored to specific scenarios, these approaches are limited by their inability to generalize across diverse environments, often requiring time-consuming retraining. To address this issue, we introduce Generalizable Multiple Access (GMA), a novel Meta-Reinforcement Learning (meta-RL)-based MAC protocol designed for rapid adaptation across heterogeneous network environments. GMA leverages a context-based meta-RL approach with Mixture of Experts (MoE) to improve representation learning, enhancing latent information extraction. By learning a meta-policy during training, GMA enables fast adaptation to different and previously unknown environments, without prior knowledge of the specific MAC protocols in use. Simulation results demonstrate that, although the GMA protocol experiences a slight performance drop compared to baseline methods in training environments, it achieves faster convergence and higher performance in new, unseen environments.
