Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Mengyuan Li; Qianfan Lu; Jiachen Tian; Hongjun Hu; Yu Han; Xiao Li; Chao-kai Wen; Shi Jin

Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Mengyuan Li, Qianfan Lu, Jiachen Tian, Hongjun Hu, Yu Han, Xiao Li, Chao-kai Wen, Shi Jin

Abstract

In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...

Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Abstract

Paper Structure (47 sections, 22 equations, 10 figures, 1 table)

This paper contains 47 sections, 22 equations, 10 figures, 1 table.

Introduction
Prior Works
Beam Training-Based Methods
Wireless-Only Beam Prediction
Multimodal Environment-Aware Beam Prediction
Main Contributions
System Model
Channel Model
Problem Formulation
Structure-Aware Multimodal LLM Framework
Overall Workflow
Multimodal Input Representation
Multimodal Encoders and Feature Fusion
LLM-Driven Reasoning
Cascaded Prediction Heads
...and 32 more sections

Figures (10)

Figure 1: Illustration of the XL-MIMO system model in LAE scenarios: The BS is equipped with a UPA, an RGB camera, and a LiDAR, while the UAV is equipped with a GPS which feeds back locations to the BS.
Figure 2: Overall workflow of the proposed structure-aware LLM-driven multimodal beam prediction framework.
Figure 3: Architecture of the designed multimodal feature fusion module.
Figure 4: Architecture of the designed textual prompt encoder and examples of designed textual prompts.
Figure 5: Architecture of the designed beam prediction head.
...and 5 more figures

Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Abstract

Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction

Authors

Abstract

Table of Contents

Figures (10)