Radiance-Field Reinforced Pretraining: Scaling Localization Models with Unlabeled Wireless Signals
Guosheng Wang, Shen Wang, Lei Yang
TL;DR
The paper tackles cross-scene generalization in RF-based indoor localization by introducing Radiance-Field Reinforced Pretraining (RFRP), a self-supervised framework that pretrains a large localization encoder (LocGPT+) with a scene-specific RF-NeRF decoder using unlabeled RF data. LocGPT+ employs a Transformer with Mixture-of-Experts to learn scene-agnostic features, while RF-NeRF enforces physics-based spectral reconstruction through voxel radiosity and ray tracing. The approach is augmented with a masked autoencoder strategy and a joint training objective, enabling effective fine-tuning with limited labeled data. Empirical results across 100 scenes show substantial localization gains over non-pretrained and supervised-pretrained baselines, demonstrating scalable, label-efficient indoor localization with strong generalization across diverse environments.
Abstract
Radio frequency (RF)-based indoor localization offers significant promise for applications such as indoor navigation, augmented reality, and pervasive computing. While deep learning has greatly enhanced localization accuracy and robustness, existing localization models still face major challenges in cross-scene generalization due to their reliance on scene-specific labeled data. To address this, we introduce Radiance-Field Reinforced Pretraining (RFRP). This novel self-supervised pretraining framework couples a large localization model (LM) with a neural radio-frequency radiance field (RF-NeRF) in an asymmetrical autoencoder architecture. In this design, the LM encodes received RF spectra into latent, position-relevant representations, while the RF-NeRF decodes them to reconstruct the original spectra. This alignment between input and output enables effective representation learning using large-scale, unlabeled RF data, which can be collected continuously with minimal effort. To this end, we collected RF samples at 7,327,321 positions across 100 diverse scenes using four common wireless technologies--RFID, BLE, WiFi, and IIoT. Data from 75 scenes were used for training, and the remaining 25 for evaluation. Experimental results show that the RFRP-pretrained LM reduces localization error by over 40% compared to non-pretrained models and by 21% compared to those pretrained using supervised learning.
