Deep Learning based Computer-vision for Enhanced Beamforming
Sachira Karunasena, Erfan Khordad, Thomas Drummond, Rajitha Senanayake
TL;DR
This work tackles beam training overhead in mmWave/THz communications by leveraging vision data to predict beam directions. It introduces an end-to-end architecture that fuses RGB imagery with mmWave power profiles for transmitter identification, tracking, and geometry-aware beam prediction, including a vertical vanishing-point-based reshaping of the beam codebook. Key contributions include a single DL-based TX identification method that uses the mmWave channel as an input channel, a tracking module to maintain TX identity, and a two-branch neural network for Top-$N$ beam prediction that accounts for camera perspective and restricted beam search space. On the DeepSense 6G benchmark (Scenarios 3 and 4), the method achieves near-perfect Top-$5$ accuracy with substantial training overhead reductions, outperforming prior vision-aided approaches by at least 6% in Top-$1$/Top-$3$/Top-$5$ metrics and demonstrating robustness in low-light and dynamic environments.
Abstract
Meeting the high data rate demands of modern applications necessitates the utilization of high-frequency spectrum bands, including millimeter-wave and sub-terahertz bands. However, these frequencies require precise alignment of narrow communication beams between transmitters and receivers, typically resulting in significant beam training overhead. This paper introduces a novel end-to-end vision-aided beamforming framework that utilizes images to predict optimal beams while considering geometric adjustments to reduce overhead. Our model demonstrates robust adaptability to dynamic environments without relying on additional training data where the experimental results indicate a top-5 beam prediction accuracy of 98.96%, significantly surpassing current state-of-the-art solutions in vision-aided beamforming.
