FM4Com: Foundation Model for Scene-Adaptive Communication Strategy Optimization
Zhaoyang Li, Shangzhuo Xie, Qianqian Yang
TL;DR
The paper tackles the challenge of achieving global end-to-end optimization in 6G networks amid inter-module dependencies by proposing FM4Com, a multimodal foundation model that unifies channel state information (CSI) with natural-language user intents. It introduces a transformer-based architecture with a connector that semantically aligns CSI and text, and employs a chain-of-thought reinforced learning (CoT-RL) framework consisting of behavior cloning for warm-starts and policy-gradient fine-tuning to optimize end-to-end link construction under multiple objectives (BER, throughput, complexity) and user preferences. The model outputs physically realizable, personalized strategies for all modules in the transmission chain, and it is trained and evaluated using QuaDRiGa-generated time-varying CSI and GPT-2 as the backbone, demonstrating superior performance over traditional planning-based baselines, especially under challenging channel conditions. The work demonstrates practical end-to-end optimization with cross-modal reasoning, enabling robust, adaptive, and user-tailored 6G links and highlighting the potential of foundation models for intelligent radio interface design.
Abstract
The emergence of sixth-generation (6G) networks heralds an intelligent communication ecosystem driven by AI-native air interfaces. However, current physical-layer designs-typically following modular and isolated optimization paradigms-fail to achieve global end-to-end optimality due to neglected inter-module dependencies. Although large language models (LLMs) have recently been applied to communication tasks such as beam prediction and resource allocation, existing studies remain limited to single-task or single-modality scenarios and lack the ability to jointly reason over communication states and user intents for personalized strategy adaptation. To address these limitations, this paper proposes a novel multimodal communication decision-making model based on reinforcement learning. The proposed model semantically aligns channel state information (CSI) and textual user instructions, enabling comprehensive understanding of both physical-layer conditions and communication intents. It then generates physically realizable, user-customized link construction strategies that dynamically adapt to changing environments and preference tendencies. A two-stage reinforcement learning framework is employed: the first stage expands the experience pool via heuristic exploration and behavior cloning to obtain a near-optimal initialization, while the second stage fine-tunes the model through multi-objective reinforcement learning considering bit error rate, throughput, and complexity. Experimental results demonstrate that the proposed model significantly outperforms conventional planning-based algorithms under challenging channel conditions, achieving robust, efficient, and personalized 6G link construction.
