Radio Foundation Models: Pre-training Transformers for 5G-based Indoor Localization
Jonathan Ott, Jonas Pirkl, Maximilian Stahlke, Tobias Feigl, Christopher Mutschler
TL;DR
The paper addresses the data bottleneck in indoor 5G radio fingerprinting for localization by introducing a self-supervised pre-training framework for a Transformer on unlabeled channel impulse responses. It proposes a novel pretext task that masks CIR components and trains the model to reconstruct them, enabling environment-specific representations without reference data. After pre-training, a light fine-tuning on a small set of labeled CIRs yields state-of-the-art localization accuracy with an order-of-magnitude reduction in labeled data, demonstrated on two real-world 5G datasets and a synthetic LoS dataset. The results suggest this approach can serve as a foundation model for radio fingerprinting, offering cost-effective, robust indoor localization and groundwork for future extension to dynamic environments and other radio systems.
Abstract
Artificial Intelligence (AI)-based radio fingerprinting (FP) outperforms classic localization methods in propagation environments with strong multipath effects. However, the model and data orchestration of FP are time-consuming and costly, as it requires many reference positions and extensive measurement campaigns for each environment. Instead, modern unsupervised and self-supervised learning schemes require less reference data for localization, but either their accuracy is low or they require additional sensor information, rendering them impractical. In this paper we propose a self-supervised learning framework that pre-trains a general transformer (TF) neural network on 5G channel measurements that we collect on-the-fly without expensive equipment. Our novel pretext task randomly masks and drops input information to learn to reconstruct it. So, it implicitly learns the spatiotemporal patterns and information of the propagation environment that enable FP-based localization. Most interestingly, when we optimize this pre-trained model for localization in a given environment, it achieves the accuracy of state-of-the-art methods but requires ten times less reference data and significantly reduces the time from training to operation.
