Table of Contents
Fetching ...

Occlusion-Aware Multimodal Beam Prediction and Pose Estimation for mmWave V2I

Abidemi Orimogunje, Hyunwoo Park, Kyeong-Ju Cha, Igbafe Orikumhi, Sunwoo Kim, Dejan Vukobratovic

Abstract

We propose an occlusion-aware multimodal learning framework that is inspired by simultaneous localization and mapping (SLAM) concepts for trajectory interpretation and pose prediction. Targeting mmWave vehicle-to-infrastructure (V2I) beam management under dynamic blockage, our Transformer-based fusion network ingests synchronized RGB images, LiDAR point clouds, radar range-angle maps, GNSS, and short-term mmWave power history. It jointly predicts the receive beam index, blockage probability, and 2D position using labels automatically derived from 64-beam sweep power vectors, while an offline LiDAR map enables SLAM-style trajectory visualization. On the 60 GHz DeepSense 6G Scenario 31 dataset, the model achieves 50.92\% Top-1 and 86.50\% Top-3 beam accuracy with 0.018 bits/s/Hz spectral-efficiency loss, 63.35\% blocked-class F1, and 1.33m position RMSE. Multimodal fusion outperforms radio-only and strong camera-only baselines, showing the value of coupling perception and communication for future 6G V2I systems.

Occlusion-Aware Multimodal Beam Prediction and Pose Estimation for mmWave V2I

Abstract

We propose an occlusion-aware multimodal learning framework that is inspired by simultaneous localization and mapping (SLAM) concepts for trajectory interpretation and pose prediction. Targeting mmWave vehicle-to-infrastructure (V2I) beam management under dynamic blockage, our Transformer-based fusion network ingests synchronized RGB images, LiDAR point clouds, radar range-angle maps, GNSS, and short-term mmWave power history. It jointly predicts the receive beam index, blockage probability, and 2D position using labels automatically derived from 64-beam sweep power vectors, while an offline LiDAR map enables SLAM-style trajectory visualization. On the 60 GHz DeepSense 6G Scenario 31 dataset, the model achieves 50.92\% Top-1 and 86.50\% Top-3 beam accuracy with 0.018 bits/s/Hz spectral-efficiency loss, 63.35\% blocked-class F1, and 1.33m position RMSE. Multimodal fusion outperforms radio-only and strong camera-only baselines, showing the value of coupling perception and communication for future 6G V2I systems.

Paper Structure

This paper contains 18 sections, 16 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Illustration of the mmWave V2I scenario and multimodal integrated and sensing (ISAC) node used for occlusion-aware receive-beam selection and 2D vehicle localization.
  • Figure 2: Training and validation multi-task loss over epochs for the multimodal model.
  • Figure 3: Validation Top-1 beam accuracy versus epoch for the multimodal model and unimodal baselines.
  • Figure 4: Validation Top-3 beam accuracy versus epoch for the multimodal model and baselines.
  • Figure 5: Average SE drop versus epoch for training and validation splits.
  • ...and 3 more figures