Attention-Enhanced Learning for Sensing-Assisted Long-Term Beam Tracking in mmWave Communications
Mengyuan Ma, Nhan Thanh Nguyen, Nir Shlezinger, Yonina C. Eldar, Markku Juntti
TL;DR
This work tackles the overhead-heavy problem of beam training in mmWave by leveraging vision data from an infrastructure camera to perform CSI-free, long-term beam tracking. The authors propose an efficient end-to-end architecture that fuses a five-layer CNN for spatial feature extraction, a GRU for temporal modeling, and a residual multi-head attention module to capture long-range dependencies, predicting beam indices for the current slot and $J$ future slots. On DeepSense 6G Scenario 9, the method achieves Top-5 accuracy exceeding 90% across the current and six future slots while consuming only about 3% of the computational resources of prior state-of-the-art solutions, effectively reducing sensing and processing overhead. This vision-aided approach advances practical ISAC-enabled mmWave systems by enabling robust long-term beam tracking with low latency and energy demands, though it leaves open challenges in domain shifts and multi-user settings.
Abstract
Beam training and prediction in millimeter-wave communications are highly challenging due to fast time-varying channels and sensitivity to blockages and mobility. In this context, infrastructure-mounted cameras can capture rich environmental information that can facilitate beam tracking design. In this work, we develop an efficient attention-enhanced machine learning model for long-term beam tracking built upon convolutional neural networks and gated recurrent units to predict both current and future beams from past observed images. The integrated temporal attention mechanism substantially improves its predictive performance. Numerical results demonstrate that the proposed design achieves Top-5 beam prediction accuracies exceeding 90% across both current and six future time slots, significantly reducing overhead arising from sensing and processing for beam training. It further attains 97% of state-of-the-art performance with only 3% of the computational complexity.
