Table of Contents
Fetching ...

6G WavesFM: A Foundation Model for Sensing, Communication, and Localization

Ahmed Aboulfotouh, Elsayed Mohammed, Hatem Abou-Zeid

TL;DR

The paper tackles the need for a unified AI model in 6G networks that can perform sensing, communication, and localization with limited task-specific labels. It introduces WavesFM, a Vision Transformer–based Wireless Foundation Model trained with Masked Wireless Modeling on real-world RF spectrograms, CSI, and OFDM IQ data, featuring a shared backbone and LoRA-enabled task adapters. Key contributions include achieving approximately 80% parameter sharing across four downstream tasks and up to 5x faster convergence when pretraining data aligns with downstream tasks, with about 1.5 million task-specific parameters required for LoRA to reach or exceed supervised baselines. The results demonstrate strong cross-task generalization and efficiency, supporting a vision of AI-native, adaptable 6G networks that can scale across sensing, communication, and localization tasks.

Abstract

This paper introduces WavesFM, a novel Wireless Foundation Model (WFM) framework, capable of supporting a wide array of communication, sensing, and localization tasks. Our proposed architecture combines a shared Vision Transformer (ViT) backbone with task-specific multi-layer perceptron (MLP) heads and incorporates Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. This design promotes full parameter sharing across tasks, significantly reducing the computational and memory footprint without sacrificing performance. The model processes both image-like wireless modalities, such as spectrograms and channel state information (CSI), and in-phase and quadrature (IQ) signals arranged as orthogonal frequency-division multiplexing (OFDM) resource grids. We demonstrate the strong generalization capabilities of WavesFM through extensive experiments on four downstream tasks: Fifth Generation New Radio (5G NR) positioning; multiple-input multiple-output OFDM (MIMO-OFDM) channel estimation; human activity sensing; and radio-frequency (RF) signal classification. Compared to supervised baselines trained individually, our approach achieves superior performance while sharing 80% of its parameters across tasks. Furthermore, we show that pretraining on domain-relevant data not only boosts performance but also accelerates convergence, reducing training time by up to 5x. These results demonstrate that our unified WFM can support diverse tasks and deliver significant gains in both performance and efficiency, highlighting the transformative potential of foundation models to drive AI-native paradigms in future sixth-generation (6G) networks.

6G WavesFM: A Foundation Model for Sensing, Communication, and Localization

TL;DR

The paper tackles the need for a unified AI model in 6G networks that can perform sensing, communication, and localization with limited task-specific labels. It introduces WavesFM, a Vision Transformer–based Wireless Foundation Model trained with Masked Wireless Modeling on real-world RF spectrograms, CSI, and OFDM IQ data, featuring a shared backbone and LoRA-enabled task adapters. Key contributions include achieving approximately 80% parameter sharing across four downstream tasks and up to 5x faster convergence when pretraining data aligns with downstream tasks, with about 1.5 million task-specific parameters required for LoRA to reach or exceed supervised baselines. The results demonstrate strong cross-task generalization and efficiency, supporting a vision of AI-native, adaptable 6G networks that can scale across sensing, communication, and localization tasks.

Abstract

This paper introduces WavesFM, a novel Wireless Foundation Model (WFM) framework, capable of supporting a wide array of communication, sensing, and localization tasks. Our proposed architecture combines a shared Vision Transformer (ViT) backbone with task-specific multi-layer perceptron (MLP) heads and incorporates Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. This design promotes full parameter sharing across tasks, significantly reducing the computational and memory footprint without sacrificing performance. The model processes both image-like wireless modalities, such as spectrograms and channel state information (CSI), and in-phase and quadrature (IQ) signals arranged as orthogonal frequency-division multiplexing (OFDM) resource grids. We demonstrate the strong generalization capabilities of WavesFM through extensive experiments on four downstream tasks: Fifth Generation New Radio (5G NR) positioning; multiple-input multiple-output OFDM (MIMO-OFDM) channel estimation; human activity sensing; and radio-frequency (RF) signal classification. Compared to supervised baselines trained individually, our approach achieves superior performance while sharing 80% of its parameters across tasks. Furthermore, we show that pretraining on domain-relevant data not only boosts performance but also accelerates convergence, reducing training time by up to 5x. These results demonstrate that our unified WFM can support diverse tasks and deliver significant gains in both performance and efficiency, highlighting the transformative potential of foundation models to drive AI-native paradigms in future sixth-generation (6G) networks.

Paper Structure

This paper contains 16 sections, 19 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of WavesFM, its multi-task capabilities and fine-tuning techniques.
  • Figure 2: Samples from the pre-training datasets.
  • Figure 3: Samples from the fine-tuning datasets.
  • Figure 4: Overview of the proposed methodology. Figure \ref{['fig:pretraining']} illustrates masked wireless modeling pre-training, Figure \ref{['fig:finetuning']} depicts the conventional fine-tuning process and Figure \ref{['fig:vit']} shows the ViT block internal structure.
  • Figure 5: Reconstruction Examples for ViT-All at $70\%$, $80\%$ and $90\%$ masking ratios.
  • ...and 4 more figures