Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels
Sina Tavakolian, Nhan Thanh Nguyen, Ahmed Alkhateeb, Markku Juntti
TL;DR
This work tackles the challenge of high beam training overhead in mmWave systems by transferring knowledge from a large, high-performing teacher network to compact student networks. The authors formulate sub-6 GHz–to–mmWave beam mapping as a classification task and demonstrate a KD-based framework with IKD, RKD, and self-distillation to produce lightweight models that closely match the teacher’s beam-prediction accuracy and spectral efficiency. Empirical results on DeepMIMO datasets show up to 99% reduction in trainable parameters and FLOPs, with RKD offering a slight performance edge over IKD and both surpassing a non-distilled baseline. The approach enables practical, low-complexity DL solutions for real-time mmWave beamforming in high-mobility scenarios, with potential extensions to dynamic antenna selection and reduced RF chains.
Abstract
Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.
