Table of Contents
Fetching ...

SMoRFFI: A Large-Scale Same-Model 2.4 GHz Wi-Fi Dataset and Reproducible Framework for RF Fingerprinting

Zewei Guo, Zhen Jia, JinXiao Zhu, Wenhao Huang, Yin Chen

TL;DR

This work tackles the challenge of RF fingerprinting for same-model devices by introducing SMoRFFI, a large-scale, homogeneous dataset built from 123 identical IEEE 802.11g devices and accompanied by a fully reproducible framework for data collection, feature extraction, and benchmarking. The dataset comprises 35.42 million raw I/Q samples and 1.85 million RF features, with a baseline Random Forest classifier achieving 89.06% accuracy, especially after Kalman Filtering enhances feature stability. Key contributions include a detailed dataset structure with two CSV types, a three-module reproducible workflow (data collection, feature extraction, evaluation), and comprehensive feature analyses that highlight frequency-related features as the most discriminative. The framework and dataset enable standardized evaluation, fair benchmarking, and rapid exploration of new RF fingerprinting approaches in a realistic, large-scale same-model setting.

Abstract

Radio frequency (RF) fingerprinting exploits hardware imperfections for device identification, but distinguishing between same-model devices remains challenging due to their minimal hardware variations. Existing datasets for RF fingerprinting are constrained by small device scales and heterogeneous models, which hinder robust training and fair evaluation of machine learning methods. To address this gap, we introduce a large-scale dataset of same-model devices along with an open-source experimental framework. The dataset is built using 123 same-model commercial IEEE 802.11g devices, which contain 35.42 million raw I/Q samples from the preambles and corresponding 1.85 million RF features. The accompanying framework further provides a fully reproducible pipeline from data collection to performance evaluation. Within this framework, a Random Forest-based algorithm is implemented as a baseline to achieve 89.06% identification accuracy on this dataset.

SMoRFFI: A Large-Scale Same-Model 2.4 GHz Wi-Fi Dataset and Reproducible Framework for RF Fingerprinting

TL;DR

This work tackles the challenge of RF fingerprinting for same-model devices by introducing SMoRFFI, a large-scale, homogeneous dataset built from 123 identical IEEE 802.11g devices and accompanied by a fully reproducible framework for data collection, feature extraction, and benchmarking. The dataset comprises 35.42 million raw I/Q samples and 1.85 million RF features, with a baseline Random Forest classifier achieving 89.06% accuracy, especially after Kalman Filtering enhances feature stability. Key contributions include a detailed dataset structure with two CSV types, a three-module reproducible workflow (data collection, feature extraction, evaluation), and comprehensive feature analyses that highlight frequency-related features as the most discriminative. The framework and dataset enable standardized evaluation, fair benchmarking, and rapid exploration of new RF fingerprinting approaches in a realistic, large-scale same-model setting.

Abstract

Radio frequency (RF) fingerprinting exploits hardware imperfections for device identification, but distinguishing between same-model devices remains challenging due to their minimal hardware variations. Existing datasets for RF fingerprinting are constrained by small device scales and heterogeneous models, which hinder robust training and fair evaluation of machine learning methods. To address this gap, we introduce a large-scale dataset of same-model devices along with an open-source experimental framework. The dataset is built using 123 same-model commercial IEEE 802.11g devices, which contain 35.42 million raw I/Q samples from the preambles and corresponding 1.85 million RF features. The accompanying framework further provides a fully reproducible pipeline from data collection to performance evaluation. Within this framework, a Random Forest-based algorithm is implemented as a baseline to achieve 89.06% identification accuracy on this dataset.

Paper Structure

This paper contains 16 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Structure of the IEEE 802.11g frame Committee2009.
  • Figure 2: Structure of the file 24:d7:eb:38:c7:e8_pre.csv.
  • Figure 3: The workflow for constructing the dataset.
  • Figure 4: Experimental platform.
  • Figure 5: Workflow of the feature extraction model.
  • ...and 2 more figures