Table of Contents
Fetching ...

Towards a Foundation Model for Communication Systems

Davide Buffelli, Sowmen Das, Yu-Wei Lin, Sattar Vakili, Chien-Yi Wang, Masoud Attarifar, Pritthijit Nath, Da-shan Shiu

TL;DR

This work proposes a transformer-based foundation model for communication systems that processes raw, heterogeneous communication data. It addresses domain-specific challenges such as multi-feature inputs, mixed data types, and variable-size representations through tailored tokenization, per-slot feature embeddings, and robust preprocessing. A simulation-based, unsupervised data generation pipeline (via Sionna) supports self-supervised pre-training with masked feature prediction, targeting five key estimable features (transmission rank, selected precoder, Doppler width, and delay-profile center/length). Experiments demonstrate reliable forecasting and interpolation across these features and reveal scaling behavior: larger models trained on more data achieve better estimation accuracy, suggesting practical paths toward scalable foundation models in 6G-type systems. The work lays groundwork for broader datasets, additional features, and community-accessible pipelines to accelerate development of foundation models for communication networks.

Abstract

Artificial Intelligence (AI) has demonstrated unprecedented performance across various domains, and its application to communication systems is an active area of research. While current methods focus on task-specific solutions, the broader trend in AI is shifting toward large general models capable of supporting multiple applications. In this work, we take a step toward a foundation model for communication data--a transformer-based, multi-modal model designed to operate directly on communication data. We propose methodologies to address key challenges, including tokenization, positional embedding, multimodality, variable feature sizes, and normalization. Furthermore, we empirically demonstrate that such a model can successfully estimate multiple features, including transmission rank, selected precoder, Doppler spread, and delay profile.

Towards a Foundation Model for Communication Systems

TL;DR

This work proposes a transformer-based foundation model for communication systems that processes raw, heterogeneous communication data. It addresses domain-specific challenges such as multi-feature inputs, mixed data types, and variable-size representations through tailored tokenization, per-slot feature embeddings, and robust preprocessing. A simulation-based, unsupervised data generation pipeline (via Sionna) supports self-supervised pre-training with masked feature prediction, targeting five key estimable features (transmission rank, selected precoder, Doppler width, and delay-profile center/length). Experiments demonstrate reliable forecasting and interpolation across these features and reveal scaling behavior: larger models trained on more data achieve better estimation accuracy, suggesting practical paths toward scalable foundation models in 6G-type systems. The work lays groundwork for broader datasets, additional features, and community-accessible pipelines to accelerate development of foundation models for communication networks.

Abstract

Artificial Intelligence (AI) has demonstrated unprecedented performance across various domains, and its application to communication systems is an active area of research. While current methods focus on task-specific solutions, the broader trend in AI is shifting toward large general models capable of supporting multiple applications. In this work, we take a step toward a foundation model for communication data--a transformer-based, multi-modal model designed to operate directly on communication data. We propose methodologies to address key challenges, including tokenization, positional embedding, multimodality, variable feature sizes, and normalization. Furthermore, we empirically demonstrate that such a model can successfully estimate multiple features, including transmission rank, selected precoder, Doppler spread, and delay profile.

Paper Structure

This paper contains 23 sections, 19 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Scheme of our foundation model for communications data and the pre-training procedure. Tokens for each feature are first added to a positional embedding; features at the same input slot receive the same positional embedding (indicated by the color of the arrows). A feature embedding is then concatenated (&); the tokens for the same feature at different slots receive the same feature embedding. The transformer processes these tokens and outputs a representation for each input. During pre-training, a subset of input features is masked (denoted as "M"). The output representations corresponding to the masked inputs are decoded into the original feature space, and a reconstruction loss is computed. The model is trained to minimize this loss.
  • Figure 2: Scaling behaviour vs. data generation compute. Each curve shows the test loss for a model of a given size (5M, 30M, or 100M parameters) as a function of compute used for data generation.
  • Figure 3: Scaling behaviour vs. training compute. Each curve shows the test loss for models of a given size (5M, 30M, 100M) as a function of compute used for training.