StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

Pavlos S. Bouzinis; Panagiotis Radoglou-Grammatikis; Ioannis Makris; Thomas Lagkas; Vasileios Argyriou; Georgios Th. Papadopoulos; Panagiotis Sarigiannidis; George K. Karagiannidis

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

Pavlos S. Bouzinis, Panagiotis Radoglou-Grammatikis, Ioannis Makris, Thomas Lagkas, Vasileios Argyriou, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis, George K. Karagiannidis

TL;DR

The paper tackles data heterogeneity in federated intrusion detection by introducing StatAvg, a universal normalization mechanism that aggregates per-client feature statistics into global mean and variance, enabling consistent input scaling before FL training. The method, proven mathematically to reflect the global data distribution, is model-agnostic and compatible with any aggregation strategy. Empirical results on TON-IoT and CIC-IoT-2023 show StatAvg delivering substantial accuracy and F1 improvements over FedAvg, FedLN, and FedBN, with more stable convergence. This approach offers a practical, deployment-friendly way to mitigate non-iid feature effects in FL-based IDS and potentially other FL applications with heterogeneous data.

Abstract

Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

TL;DR

Abstract

Paper Structure (13 sections, 1 theorem, 12 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 13 sections, 1 theorem, 12 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related Work, Motivation and Contributions
Related Work
Motivation
Contribution
Preliminaries of Federated Learning
StatAvg - Statistical Averaging
Evaluation Analysis
Evaluation Datasets
Baseline Aggregation Methods
Experimental Setup
Evaluation Results
Conclusions

Key Result

Proposition 1

Let $\boldsymbol{x}_{i,s}\in\mathbb{R}^{D_i}$ be the vector containing the $s$-th feature across all samples of $\mathcal{D}_i$. Also, let $\boldsymbol{z}_s=(\boldsymbol{x}_{1,s},...,\boldsymbol{x}_{N,s})$ be the concatenation of all clients vectors, with $\boldsymbol{z}_s\in\mathbb{R}^D$. The mean

Figures (7)

Figure 1: Federated Learning workflow.
Figure 2: Visual representation of StatAvg design and implementation.
Figure 3: Testing accuracy on TON-IoT dataset.
Figure 4: Testing accuracy on CIC-IoT-2023 dataset.
Figure 5: Confusion matrix of StatAvg on TON-IoT dataset.
...and 2 more figures

Theorems & Definitions (1)

Proposition 1

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

TL;DR

Abstract

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)