Table of Contents
Fetching ...

A Federated Learning Approach for Multi-stage Threat Analysis in Advanced Persistent Threat Campaigns

Florian Nelles, Abbas Yazdinejad, Ali Dehghantanha, Reza M. Parizi, Gautam Srivastava

TL;DR

The paper tackles detecting multi-stage APT campaigns under strict privacy constraints by proposing a three-phase unsupervised federated learning framework that preserves data privacy with Paillier partial homomorphic encryption. It converts heterogeneous log data into a transaction-like format, applies Federated Fuzzy C-means for log-event type classification, and uses a pattern-extraction pipeline to form item-sets and patterns, ranked by a suspicion score for efficient analyst review. The evaluation on the SoTM34 dataset demonstrates the framework’s ability to extract meaningful patterns while maintaining data privacy, with a trade-off between federation-induced clustering quality and privacy-preserving speed, and shows substantial reduction in analyst workload through structured pattern presentation. Overall, the approach provides a practical, privacy-aware solution for robust APT detection across distributed datasets, aligning with GDPR-like data protection requirements and offering a concrete path for deploying federated, pattern-centric threat analysis in real-world environments.

Abstract

Multi-stage threats like advanced persistent threats (APT) pose severe risks by stealing data and destroying infrastructure, with detection being challenging. APTs use novel attack vectors and evade signature-based detection by obfuscating their network presence, often going unnoticed due to their novelty. Although machine learning models offer high accuracy, they still struggle to identify true APT behavior, overwhelming analysts with excessive data. Effective detection requires training on multiple datasets from various clients, which introduces privacy issues under regulations like GDPR. To address these challenges, this paper proposes a novel 3-phase unsupervised federated learning (FL) framework to detect APTs. It identifies unique log event types, extracts suspicious patterns from related log events, and orders them by complexity and frequency. The framework ensures privacy through a federated approach and enhances security using Paillier's partial homomorphic encryption. Tested on the SoTM 34 dataset, our framework compares favorably against traditional methods, demonstrating efficient pattern extraction and analysis from log files, reducing analyst workload, and maintaining stringent data privacy. This approach addresses significant gaps in current methodologies, offering a robust solution to APT detection in compliance with privacy laws.

A Federated Learning Approach for Multi-stage Threat Analysis in Advanced Persistent Threat Campaigns

TL;DR

The paper tackles detecting multi-stage APT campaigns under strict privacy constraints by proposing a three-phase unsupervised federated learning framework that preserves data privacy with Paillier partial homomorphic encryption. It converts heterogeneous log data into a transaction-like format, applies Federated Fuzzy C-means for log-event type classification, and uses a pattern-extraction pipeline to form item-sets and patterns, ranked by a suspicion score for efficient analyst review. The evaluation on the SoTM34 dataset demonstrates the framework’s ability to extract meaningful patterns while maintaining data privacy, with a trade-off between federation-induced clustering quality and privacy-preserving speed, and shows substantial reduction in analyst workload through structured pattern presentation. Overall, the approach provides a practical, privacy-aware solution for robust APT detection across distributed datasets, aligning with GDPR-like data protection requirements and offering a concrete path for deploying federated, pattern-centric threat analysis in real-world environments.

Abstract

Multi-stage threats like advanced persistent threats (APT) pose severe risks by stealing data and destroying infrastructure, with detection being challenging. APTs use novel attack vectors and evade signature-based detection by obfuscating their network presence, often going unnoticed due to their novelty. Although machine learning models offer high accuracy, they still struggle to identify true APT behavior, overwhelming analysts with excessive data. Effective detection requires training on multiple datasets from various clients, which introduces privacy issues under regulations like GDPR. To address these challenges, this paper proposes a novel 3-phase unsupervised federated learning (FL) framework to detect APTs. It identifies unique log event types, extracts suspicious patterns from related log events, and orders them by complexity and frequency. The framework ensures privacy through a federated approach and enhances security using Paillier's partial homomorphic encryption. Tested on the SoTM 34 dataset, our framework compares favorably against traditional methods, demonstrating efficient pattern extraction and analysis from log files, reducing analyst workload, and maintaining stringent data privacy. This approach addresses significant gaps in current methodologies, offering a robust solution to APT detection in compliance with privacy laws.
Paper Structure (20 sections, 3 equations, 5 figures, 9 tables, 5 algorithms)

This paper contains 20 sections, 3 equations, 5 figures, 9 tables, 5 algorithms.

Figures (5)

  • Figure 1: FL with homomorphic encryption
  • Figure 2: The proposed framework for extracting APT patterns
  • Figure 3: Processing each datatype
  • Figure 4: Minimum Centroid Distance Cluster Validation
  • Figure 5: Patterns ordered after suspicion descending left to right. Red is APT Patterns, Yellow are further suspicious events outlined by security analysts