Table of Contents
Fetching ...

Wireless Channel Aware Data Augmentation Methods for Deep Learning-Based Indoor Localization

Omer Gokalp Serbetci, Daoud Burghal, Andreas F. Molisch

TL;DR

This work tackles the data scarcity challenge in CSI-based indoor localization by introducing domain-knowledge data augmentation methods that mimic wireless propagation and transceiver behavior. It proposes transceiver-based augmentations (random phase and amplitude) and channel-based augmentations, including correlation-based realizations and four PDP-driven schemes that preserve key channel statistics. Across four real datasets spanning LOS and NLOS conditions, the proposed methods yield substantial improvements in low- to mid-data regimes (up to ~66% RMSE reduction) and, with transfer learning, further reduce the need for target-domain data. The findings demonstrate that physics-informed augmentation can rival or exceed full-measurement training in constrained data scenarios, especially in challenging environments, and that strategic sampling and augmentation of hard samples significantly enhance generalization. Collectively, the methods offer practical avenues to reduce labeling effort and enable robust DL-based indoor localization in diverse environments.

Abstract

Indoor localization is a challenging problem that - unlike outdoor localization - lacks a universal and robust solution. Machine Learning (ML), particularly Deep Learning (DL), methods have been investigated as a promising approach. Although such methods bring remarkable localization accuracy, they heavily depend on the training data collected from the environment. The data collection is usually a laborious and time-consuming task, but Data Augmentation (DA) can be used to alleviate this issue. In this paper, different from previously used DA, we propose methods that utilize the domain knowledge about wireless propagation channels and devices. The methods exploit the typical hardware component drift in the transceivers and/or the statistical behavior of the channel, in combination with the measured Power Delay Profile (PDP). We comprehensively evaluate the proposed methods to demonstrate their effectiveness. This investigation mainly focuses on the impact of factors such as the number of measurements, augmentation proportion, and the environment of interest impact the effectiveness of the different DA methods. We show that in the low-data regime (few actual measurements available), localization accuracy increases up to 50%, matching non-augmented results in the high-data regime. In addition, the proposed methods may outperform the measurement-only high-data performance by up to 33% using only 1/4 of the amount of measured data. We also exhibit the effect of different training data distribution and quality on the effectiveness of DA. Finally, we demonstrate the power of the proposed methods when employed along with Transfer Learning (TL) to address the data scarcity in target and/or source environments.

Wireless Channel Aware Data Augmentation Methods for Deep Learning-Based Indoor Localization

TL;DR

This work tackles the data scarcity challenge in CSI-based indoor localization by introducing domain-knowledge data augmentation methods that mimic wireless propagation and transceiver behavior. It proposes transceiver-based augmentations (random phase and amplitude) and channel-based augmentations, including correlation-based realizations and four PDP-driven schemes that preserve key channel statistics. Across four real datasets spanning LOS and NLOS conditions, the proposed methods yield substantial improvements in low- to mid-data regimes (up to ~66% RMSE reduction) and, with transfer learning, further reduce the need for target-domain data. The findings demonstrate that physics-informed augmentation can rival or exceed full-measurement training in constrained data scenarios, especially in challenging environments, and that strategic sampling and augmentation of hard samples significantly enhance generalization. Collectively, the methods offer practical avenues to reduce labeling effort and enable robust DL-based indoor localization in diverse environments.

Abstract

Indoor localization is a challenging problem that - unlike outdoor localization - lacks a universal and robust solution. Machine Learning (ML), particularly Deep Learning (DL), methods have been investigated as a promising approach. Although such methods bring remarkable localization accuracy, they heavily depend on the training data collected from the environment. The data collection is usually a laborious and time-consuming task, but Data Augmentation (DA) can be used to alleviate this issue. In this paper, different from previously used DA, we propose methods that utilize the domain knowledge about wireless propagation channels and devices. The methods exploit the typical hardware component drift in the transceivers and/or the statistical behavior of the channel, in combination with the measured Power Delay Profile (PDP). We comprehensively evaluate the proposed methods to demonstrate their effectiveness. This investigation mainly focuses on the impact of factors such as the number of measurements, augmentation proportion, and the environment of interest impact the effectiveness of the different DA methods. We show that in the low-data regime (few actual measurements available), localization accuracy increases up to 50%, matching non-augmented results in the high-data regime. In addition, the proposed methods may outperform the measurement-only high-data performance by up to 33% using only 1/4 of the amount of measured data. We also exhibit the effect of different training data distribution and quality on the effectiveness of DA. Finally, we demonstrate the power of the proposed methods when employed along with Transfer Learning (TL) to address the data scarcity in target and/or source environments.
Paper Structure (36 sections, 10 equations, 14 figures, 1 table, 4 algorithms)

This paper contains 36 sections, 10 equations, 14 figures, 1 table, 4 algorithms.

Figures (14)

  • Figure 1: Test Set Performance vs Augmentation Size and Methods, Model: CNN, Dataset: WILD1 Env. 1, Original Dataset Size: 100
  • Figure 2: Test Set Performance vs Augmentation Size and Methods, Model: CNN, Dataset: WILD2 Env. 2, Original Dataset Size: 100
  • Figure 3: Test Set Performance vs Augmentation Size and Methods, Model: CNN, Dataset: WILD2 Env. 1, Original Dataset Size: 8000
  • Figure 4: Test Set Performance vs Augmentation Size and Methods, Model: CNN, Dataset: WILD1 Env. 2, Original Dataset Size: 16000
  • Figure 5: Test Set Performance vs Original Dataset Size and Aug. Methods, Model: CNN, Dataset: WILD1 Env. 1, All training sets augmented to the 64000 samples
  • ...and 9 more figures