Table of Contents
Fetching ...

RFSS: A Multi-Standard RF Signal Source Separation Dataset with 3GPP-Standardized Channel and Hardware Impairments

Hao Chen, Rui Jin, Dayuan Tan

Abstract

The coexistence of heterogeneous cellular standards (2G-5G) in shared spectrum demands sophisticated RF source separation techniques, yet no public dataset exists for data-driven research on this problem. We present RFSS (RF Signal Source Separation), an open-source dataset of 100,000 multi-source RF signal samples generated with full 3GPP standards compliance. The dataset covers GSM (TS 45.004), UMTS (TS 25.211), LTE (TS 36.211), and 5G NR (TS 38.211), with 2-4 simultaneous sources per sample plus 4,000 single-source reference samples, at 30.72 MHz sample rate. Each sample passes through independent 3GPP TDL multipath fading channels and realistic hardware impairments: carrier frequency offset, I/Q imbalance, phase noise, DC offset, and PA nonlinearity (Rapp model). Two mixing modes are provided: co-channel (all sources at baseband) and adjacent-channel (each source frequency-shifted to its standard-specific carrier). The dataset totals 103 GB in HDF5 format with a 70/15/15 train/validation/test split. We benchmark five methods: FastICA, Frobenius-norm NMF, Conv-TasNet, DPRNN, and a CNN-LSTM baseline, evaluated using permutation-invariant SI-SINR (PI-SI-SINR). Conv-TasNet achieves -21.18 dB PI-SI-SINR on 2-source mixtures versus -34.91 dB for ICA, a 13.7 dB improvement. On co-channel mixtures, Conv-TasNet reaches -12.34 dB versus -28.04 dB for ICA and -16.19 dB for NMF. The dataset and evaluation code are publicly released at submission time.

RFSS: A Multi-Standard RF Signal Source Separation Dataset with 3GPP-Standardized Channel and Hardware Impairments

Abstract

The coexistence of heterogeneous cellular standards (2G-5G) in shared spectrum demands sophisticated RF source separation techniques, yet no public dataset exists for data-driven research on this problem. We present RFSS (RF Signal Source Separation), an open-source dataset of 100,000 multi-source RF signal samples generated with full 3GPP standards compliance. The dataset covers GSM (TS 45.004), UMTS (TS 25.211), LTE (TS 36.211), and 5G NR (TS 38.211), with 2-4 simultaneous sources per sample plus 4,000 single-source reference samples, at 30.72 MHz sample rate. Each sample passes through independent 3GPP TDL multipath fading channels and realistic hardware impairments: carrier frequency offset, I/Q imbalance, phase noise, DC offset, and PA nonlinearity (Rapp model). Two mixing modes are provided: co-channel (all sources at baseband) and adjacent-channel (each source frequency-shifted to its standard-specific carrier). The dataset totals 103 GB in HDF5 format with a 70/15/15 train/validation/test split. We benchmark five methods: FastICA, Frobenius-norm NMF, Conv-TasNet, DPRNN, and a CNN-LSTM baseline, evaluated using permutation-invariant SI-SINR (PI-SI-SINR). Conv-TasNet achieves -21.18 dB PI-SI-SINR on 2-source mixtures versus -34.91 dB for ICA, a 13.7 dB improvement. On co-channel mixtures, Conv-TasNet reaches -12.34 dB versus -28.04 dB for ICA and -16.19 dB for NMF. The dataset and evaluation code are publicly released at submission time.

Paper Structure

This paper contains 27 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: RFSS dataset construction pipeline. Each sample passes through independent per-source signal generation, 3GPP TDL channel modeling, hardware impairment injection, and two-mode mixing before ground-truth targets and the mixture observation are written to HDF5.
  • Figure 2: RFSS dataset composition statistics. Left: source-count distribution across 100,000 samples. Center: mixing-mode frequency (co-channel vs. adjacent-channel). Right: standard-combination frequency for multi-source samples.
  • Figure 3: Short-time Fourier transform spectrograms of single-source samples for all four cellular standards. Each panel shows time on the horizontal axis and frequency on the vertical axis, with power in dB encoded as color. The distinct spectral structures motivate the separability of multi-standard mixtures.
  • Figure 4: Signal quality characterization for the four cellular standards. Left: empirical PAPR distributions. Center: power spectral density estimates. Right: amplitude (envelope) probability density functions.
  • Figure 5: Benchmark results. Left: overall PI-SI-SINR (dB) for all five methods across 2-, 3-, and 4-source configurations. Right: co-channel PI-SI-SINR for all five methods across 2-, 3-, and 4-source configurations.