Table of Contents
Fetching ...

Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework

Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

TL;DR

A multi-modal sensing and fusion learning framework that utilizes multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established.

Abstract

Millimeter wave (mmWave) communication, utilizing beamforming techniques to address the inherent path loss limitation, is considered as one of the key technologies to support ever increasing high throughput and low latency demands of connected vehicles. However, adopting standard defined beamforming approach in highly dynamic vehicular environments often incurs high beam training overheads and reduction in the available airtime for communications, which is mainly due to exchanging pilot signals and exhaustive beam measurements. To this end, we present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads. In this framework, we first extract the representative features from the sensing modalities by modality specific encoders, then, utilize multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established. To show the generalizability of the proposed framework, we perform a comprehensive experiment in four different vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) scenarios from real world multimodal and 60 GHz mmWave wireless sensing data. The experiment reveals that the proposed framework (i) achieves up to 96.72% accuracy on predicting top-15 beams correctly, (ii) incurs roughly 0.77 dB average power loss, and (iii) improves the overall latency and beam searching space overheads by 86.81% and 76.56% respectively for top-15 beams compared to standard defined approach.

Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework

TL;DR

A multi-modal sensing and fusion learning framework that utilizes multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established.

Abstract

Millimeter wave (mmWave) communication, utilizing beamforming techniques to address the inherent path loss limitation, is considered as one of the key technologies to support ever increasing high throughput and low latency demands of connected vehicles. However, adopting standard defined beamforming approach in highly dynamic vehicular environments often incurs high beam training overheads and reduction in the available airtime for communications, which is mainly due to exchanging pilot signals and exhaustive beam measurements. To this end, we present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads. In this framework, we first extract the representative features from the sensing modalities by modality specific encoders, then, utilize multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established. To show the generalizability of the proposed framework, we perform a comprehensive experiment in four different vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) scenarios from real world multimodal and 60 GHz mmWave wireless sensing data. The experiment reveals that the proposed framework (i) achieves up to 96.72% accuracy on predicting top-15 beams correctly, (ii) incurs roughly 0.77 dB average power loss, and (iii) improves the overall latency and beam searching space overheads by 86.81% and 76.56% respectively for top-15 beams compared to standard defined approach.
Paper Structure (31 sections, 25 equations, 6 figures, 1 table)

This paper contains 31 sections, 25 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of our considered system model enabled by mmWave V2I and V2V communications.
  • Figure 2: The overview of the proposed framework, which is comprised of three distinct modality specific encoders for feature extractions, each taking the pre-processed modalities as inputs, followed by multi-modal fusion and beam prediction procedures.
  • Figure 3: The results on the basis of loss curves for V2I-Day, V2I-Night, V2V-Day, and V2V-Night scenarios (from left to right), assessing the the differences between predicted and ground truth beams during the training and validation processes over $40$ number of epochs.
  • Figure 4: The performance comparisons on average achieved accuracies in percentages and average power mmWave loss in decibels tested on all considered V2I and V2V scenarios.
  • Figure 5: (a) The 5G-NR standard defined frame structure of synchronizing signal (SS) burst (SSB1, SSB2, etc. within an SS burst in different colors are the SS blocks associated with each beam) and (b)&(c) the procedures of standardized defined exhaustive beamforming and after integrating the proposed multimodal model into it, represented as timing diagrams.
  • ...and 1 more figures