Table of Contents
Fetching ...

Deep Learning based Computer-vision for Enhanced Beamforming

Sachira Karunasena, Erfan Khordad, Thomas Drummond, Rajitha Senanayake

TL;DR

This work tackles beam training overhead in mmWave/THz communications by leveraging vision data to predict beam directions. It introduces an end-to-end architecture that fuses RGB imagery with mmWave power profiles for transmitter identification, tracking, and geometry-aware beam prediction, including a vertical vanishing-point-based reshaping of the beam codebook. Key contributions include a single DL-based TX identification method that uses the mmWave channel as an input channel, a tracking module to maintain TX identity, and a two-branch neural network for Top-$N$ beam prediction that accounts for camera perspective and restricted beam search space. On the DeepSense 6G benchmark (Scenarios 3 and 4), the method achieves near-perfect Top-$5$ accuracy with substantial training overhead reductions, outperforming prior vision-aided approaches by at least 6% in Top-$1$/Top-$3$/Top-$5$ metrics and demonstrating robustness in low-light and dynamic environments.

Abstract

Meeting the high data rate demands of modern applications necessitates the utilization of high-frequency spectrum bands, including millimeter-wave and sub-terahertz bands. However, these frequencies require precise alignment of narrow communication beams between transmitters and receivers, typically resulting in significant beam training overhead. This paper introduces a novel end-to-end vision-aided beamforming framework that utilizes images to predict optimal beams while considering geometric adjustments to reduce overhead. Our model demonstrates robust adaptability to dynamic environments without relying on additional training data where the experimental results indicate a top-5 beam prediction accuracy of 98.96%, significantly surpassing current state-of-the-art solutions in vision-aided beamforming.

Deep Learning based Computer-vision for Enhanced Beamforming

TL;DR

This work tackles beam training overhead in mmWave/THz communications by leveraging vision data to predict beam directions. It introduces an end-to-end architecture that fuses RGB imagery with mmWave power profiles for transmitter identification, tracking, and geometry-aware beam prediction, including a vertical vanishing-point-based reshaping of the beam codebook. Key contributions include a single DL-based TX identification method that uses the mmWave channel as an input channel, a tracking module to maintain TX identity, and a two-branch neural network for Top- beam prediction that accounts for camera perspective and restricted beam search space. On the DeepSense 6G benchmark (Scenarios 3 and 4), the method achieves near-perfect Top- accuracy with substantial training overhead reductions, outperforming prior vision-aided approaches by at least 6% in Top-/Top-/Top- metrics and demonstrating robustness in low-light and dynamic environments.

Abstract

Meeting the high data rate demands of modern applications necessitates the utilization of high-frequency spectrum bands, including millimeter-wave and sub-terahertz bands. However, these frequencies require precise alignment of narrow communication beams between transmitters and receivers, typically resulting in significant beam training overhead. This paper introduces a novel end-to-end vision-aided beamforming framework that utilizes images to predict optimal beams while considering geometric adjustments to reduce overhead. Our model demonstrates robust adaptability to dynamic environments without relying on additional training data where the experimental results indicate a top-5 beam prediction accuracy of 98.96%, significantly surpassing current state-of-the-art solutions in vision-aided beamforming.

Paper Structure

This paper contains 12 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Proposed end-to-end Vision-aided Beamforming approach. The first $\mathbf{M}$ frames of the video sequence will be used for TX identification. The remaining $\mathbf{M+1:J^{th}}$ frame is used to track the detected TX and predict the top-$N$ beams for each frame.
  • Figure 2: Structuring the input for the TX Identification methods.
  • Figure 3: Beam shape designs for Scenarios 3 & 4 of DeepSense 6G Dataset.
  • Figure 4: Custom Neural Network Architecture for Top-$N$ Beam Prediction with dual processing of Isolated TX Image and Reduced Beam Search Space.
  • Figure 5: Comparison of Top-N Beam Prediction metrics with current state-of-the-art methods. "J. Nie et. al" represents the work done in nocturnal and "S. Imran et. al" represents the results generated by applying the method in environment.