Table of Contents
Fetching ...

Federated Learning for Efficient Condition Monitoring and Anomaly Detection in Industrial Cyber-Physical Systems

William Marfo, Deepak K. Tosh, Shirley V. Moore

TL;DR

The paper tackles reliable real-time condition monitoring and anomaly localization in industrial CPS by extending federated learning with adaptive client aggregation, dynamic node selection, and Weibull-based checkpointing. It integrates SOM-based anomaly detection and component-level localization, validated on NASA Bearing and Hydraulic Systems datasets, achieving up to 99.5% AUC-ROC and approximately 2× faster execution than FedAvg while remaining resilient to node failures. Statistical validation via the Mann-Whitney U test (p < 0.05) confirms significant improvements in detection performance and efficiency over state-of-the-art FL methods. The framework offers CPS-specific robustness and scalability, enabling efficient, distributed monitoring with reduced downtime and better fault isolation in industrial settings.

Abstract

Detecting and localizing anomalies in cyber-physical systems (CPS) has become increasingly challenging as systems grow in complexity, particularly due to varying sensor reliability and node failures in distributed environments. While federated learning (FL) provides a foundation for distributed model training, existing approaches often lack mechanisms to address these CPS-specific challenges. This paper introduces an enhanced FL framework with three key innovations: adaptive model aggregation based on sensor reliability, dynamic node selection for resource optimization, and Weibull-based checkpointing for fault tolerance. The proposed framework ensures reliable condition monitoring while tackling the computational and reliability challenges of industrial CPS deployments. Experiments on the NASA Bearing and Hydraulic System datasets demonstrate superior performance compared to state-of-the-art FL methods, achieving 99.5% AUC-ROC in anomaly detection and maintaining accuracy even under node failures. Statistical validation using the Mann-Whitney U test confirms significant improvements, with a p-value less than 0.05, in both detection accuracy and computational efficiency across various operational scenarios.

Federated Learning for Efficient Condition Monitoring and Anomaly Detection in Industrial Cyber-Physical Systems

TL;DR

The paper tackles reliable real-time condition monitoring and anomaly localization in industrial CPS by extending federated learning with adaptive client aggregation, dynamic node selection, and Weibull-based checkpointing. It integrates SOM-based anomaly detection and component-level localization, validated on NASA Bearing and Hydraulic Systems datasets, achieving up to 99.5% AUC-ROC and approximately 2× faster execution than FedAvg while remaining resilient to node failures. Statistical validation via the Mann-Whitney U test (p < 0.05) confirms significant improvements in detection performance and efficiency over state-of-the-art FL methods. The framework offers CPS-specific robustness and scalability, enabling efficient, distributed monitoring with reduced downtime and better fault isolation in industrial settings.

Abstract

Detecting and localizing anomalies in cyber-physical systems (CPS) has become increasingly challenging as systems grow in complexity, particularly due to varying sensor reliability and node failures in distributed environments. While federated learning (FL) provides a foundation for distributed model training, existing approaches often lack mechanisms to address these CPS-specific challenges. This paper introduces an enhanced FL framework with three key innovations: adaptive model aggregation based on sensor reliability, dynamic node selection for resource optimization, and Weibull-based checkpointing for fault tolerance. The proposed framework ensures reliable condition monitoring while tackling the computational and reliability challenges of industrial CPS deployments. Experiments on the NASA Bearing and Hydraulic System datasets demonstrate superior performance compared to state-of-the-art FL methods, achieving 99.5% AUC-ROC in anomaly detection and maintaining accuracy even under node failures. Statistical validation using the Mann-Whitney U test confirms significant improvements, with a p-value less than 0.05, in both detection accuracy and computational efficiency across various operational scenarios.

Paper Structure

This paper contains 16 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Framework architecture showing data flow from sensor collection through SOM-based detection to FL with adaptive checkpointing. The system processes raw sensor data to detect anomalies, localizes faulty components, and maintains model consistency across distributed clients.
  • Figure 2: Bearing test rig configuration showing sensor placement for vibration monitoring
  • Figure 3: SOM-based anomaly detection across datasets showing quantization error progression: Bearing Dataset 1 (top-left) exhibits gradual degradation, Bearing Dataset 2 (top-right) shows early-stage deterioration, Bearing Dataset 3 (bottom-left) demonstrates abrupt failure, and Hydraulic System (bottom-right) reveals multi-component anomaly patterns.
  • Figure 4: Component-level anomaly localization showing cumulative anomaly counts: Bearing Datasets 1 and 3 (bottom) identify critical components, while the Hydraulic System (top) reveals sensor-specific degradation patterns. Localization results for Bearing Dataset 2 align with previous findings marfo2022condition.
  • Figure 5: Performance analysis showing accuracy versus number of clients (left) and dropout rates (right) for both datasets. Our framework maintains superior detection capability through adaptive aggregation and Weibull-based checkpointing.