Meta-Learning Multi-armed Bandits for Beam Tracking in 5G and 6G Networks
Alexander Mattick, George Yammine, Georgios Kontes, Setareh Maghsudi, Christopher Mutschler
TL;DR
This work tackles beam management in mmWave networks by formulating beam selection as a partially observable restless multi-armed bandit problem. It introduces a meta-learning approach that amortizes posterior inference via stochastic variational inference, decomposing the problem into a bandit-search head and a goal-predictor head to handle UE movement and environmental dynamics. During deployment, a fast online inference step mirrors Thompson sampling, enabling real-time beam decisions using RSS feedback only. Empirical results show robust generalization across trajectories, environments, and codebook sizes, outperforming state-of-the-art baselines and offering substantial reductions in probe requirements. The approach provides a scalable framework for online RMAB inference in dynamic wireless settings with practical relevance to 5G/6G beam management.
Abstract
Beamforming-capable antenna arrays with many elements enable higher data rates in next generation 5G and 6G networks. In current practice, analog beamforming uses a codebook of pre-configured beams with each of them radiating towards a specific direction, and a beam management function continuously selects \textit{optimal} beams for moving user equipments (UEs). However, large codebooks and effects caused by reflections or blockages of beams make an optimal beam selection challenging. In contrast to previous work and standardization efforts that opt for supervised learning to train classifiers to predict the next best beam based on previously selected beams we formulate the problem as a partially observable Markov decision process (POMDP) and model the environment as the codebook itself. At each time step, we select a candidate beam conditioned on the belief state of the unobservable optimal beam and previously probed beams. This frames the beam selection problem as an online search procedure that locates the moving optimal beam. In contrast to previous work, our method handles new or unforeseen trajectories and changes in the physical environment, and outperforms previous work by orders of magnitude.
