Table of Contents
Fetching ...

Privacy-Preserving Hybrid Ensemble Model for Network Anomaly Detection: Balancing Security and Data Protection

Shaobo Liu, Zihao Zhao, Weijie He, Jiren Wang, Jing Peng, Haoyuan Ma

TL;DR

Addresses privacy-preserving network anomaly detection under imbalanced threats. Introduces a privacy-preserving hybrid ensemble of KNN, SVM, XGBoost, and ANN, supplemented with synthetic data to mitigate small-sample issues while protecting sensitive data via techniques such as federated learning and differential privacy $\epsilon$. The ensemble is fused with Logistic Regression and evaluated on standard metrics, achieving top performance (accuracy $=94.3\%$, precision $=93.9\%$, recall $=93.2\%$, F1 $=93.5\%$) and robust privacy guarantees. This work demonstrates that combining diverse learners within privacy-preserving protocols yields improved network intrusion detection without compromising data protection, with practical implications for security and compliance.

Abstract

Privacy-preserving network anomaly detection has become an essential area of research due to growing concerns over the protection of sensitive data. Traditional anomaly detection models often prioritize accuracy while neglecting the critical aspect of privacy. In this work, we propose a hybrid ensemble model that incorporates privacy-preserving techniques to address both detection accuracy and data protection. Our model combines the strengths of several machine learning algorithms, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), XGBoost, and Artificial Neural Networks (ANN), to create a robust system capable of identifying network anomalies while ensuring privacy. The proposed approach integrates advanced preprocessing techniques that enhance data quality and address the challenges of small sample sizes and imbalanced datasets. By embedding privacy measures into the model design, our solution offers a significant advancement over existing methods, ensuring both enhanced detection performance and strong privacy safeguards.

Privacy-Preserving Hybrid Ensemble Model for Network Anomaly Detection: Balancing Security and Data Protection

TL;DR

Addresses privacy-preserving network anomaly detection under imbalanced threats. Introduces a privacy-preserving hybrid ensemble of KNN, SVM, XGBoost, and ANN, supplemented with synthetic data to mitigate small-sample issues while protecting sensitive data via techniques such as federated learning and differential privacy . The ensemble is fused with Logistic Regression and evaluated on standard metrics, achieving top performance (accuracy , precision , recall , F1 ) and robust privacy guarantees. This work demonstrates that combining diverse learners within privacy-preserving protocols yields improved network intrusion detection without compromising data protection, with practical implications for security and compliance.

Abstract

Privacy-preserving network anomaly detection has become an essential area of research due to growing concerns over the protection of sensitive data. Traditional anomaly detection models often prioritize accuracy while neglecting the critical aspect of privacy. In this work, we propose a hybrid ensemble model that incorporates privacy-preserving techniques to address both detection accuracy and data protection. Our model combines the strengths of several machine learning algorithms, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), XGBoost, and Artificial Neural Networks (ANN), to create a robust system capable of identifying network anomalies while ensuring privacy. The proposed approach integrates advanced preprocessing techniques that enhance data quality and address the challenges of small sample sizes and imbalanced datasets. By embedding privacy measures into the model design, our solution offers a significant advancement over existing methods, ensuring both enhanced detection performance and strong privacy safeguards.

Paper Structure

This paper contains 19 sections, 17 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Distribution of network anomalies.
  • Figure 2: Cluster similarity graph of features.
  • Figure 3: Box plot of selected features.
  • Figure 4: Overall model process.
  • Figure 5: KNN trained in 5-class with $k=5$.
  • ...and 1 more figures