Table of Contents
Fetching ...

A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models

Chao Feng, Alberto Huertas Celdran, Jing Han, Heqing Ren, Xi Cheng, Zien Zeng, Lucas Krauter, Gerome Bovet, Burkhard Stiller

Abstract

This paper introduces a dataset and an experimental study on Decentralized Federated Learning (DFL) for Internet of Things (IoT) crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware attacks. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 data records used for model training and evaluation. Experiments on the DFL platform compare traditional Machine Learning (ML), Centralized Federated Learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.

A Crowdsensing Intrusion Detection Dataset For Decentralized Federated Learning Models

Abstract

This paper introduces a dataset and an experimental study on Decentralized Federated Learning (DFL) for Internet of Things (IoT) crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware attacks. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 data records used for model training and evaluation. Experiments on the DFL platform compare traditional Machine Learning (ML), Centralized Federated Learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.

Paper Structure

This paper contains 39 sections, 1 equation, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Workflow of the proposed dataset construction. IoT devices generate behavioral data under benign and malware-infected conditions, which are continuously monitored across six system dimensions. The resulting raw logs are cleansed, aggregated into fixed time windows, and converted into structured feature vectors, forming the final dataset.
  • Figure 2: Top-5 features per class ranked by statistical relevance.
  • Figure 3: Number of data records per IoT device after feature extraction.
  • Figure 4: Training F1 score across training epochs for centralized ML, CFL (8 nodes), and DFL (8 nodes, fully connected topology) under IID data distribution.
  • Figure 5: Normalized confusion matrix of the DFL fully connected topology with 8 nodes under IID data distribution.