Table of Contents
Fetching ...

Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Mengyuan Li, Qianfan Lu, Jiachen Tian, Hongjun Hu, Yu Han, Xiao Li, Chao-kai Wen, Shi Jin

Abstract

In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...

Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Abstract

In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...
Paper Structure (47 sections, 22 equations, 10 figures, 1 table)

This paper contains 47 sections, 22 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Illustration of the XL-MIMO system model in LAE scenarios: The BS is equipped with a UPA, an RGB camera, and a LiDAR, while the UAV is equipped with a GPS which feeds back locations to the BS.
  • Figure 2: Overall workflow of the proposed structure-aware LLM-driven multimodal beam prediction framework.
  • Figure 3: Architecture of the designed multimodal feature fusion module.
  • Figure 4: Architecture of the designed textual prompt encoder and examples of designed textual prompts.
  • Figure 5: Architecture of the designed beam prediction head.
  • ...and 5 more figures