The Wisdom of the Crowd: High-Fidelity Classification of Cyber-Attacks and Faults in Power Systems Using Ensemble and Machine Learning
Emad Abukhousa, Syed Sohail Feroz Syed Afroz, Fahad Alsaeed, Abdulaziz Qwbaiban, Saman Zonouz, A. P. Sakis Meliopoulos
TL;DR
This work tackles the challenge of reliably classifying cyber-attacks and physical faults in power systems with high inverter-based resource (IBR) penetration. It introduces a high-fidelity, streaming-aware evaluation framework and benchmarks 12 ML models on EMT simulations (WinIGS) with COMTRADE-format data, using cycle-aware post-processing and a confidence threshold to stabilize decisions. The study finds that offline accuracies can approach 99.9% yet streaming performance varies markedly, with MLPs achieving the highest coverage (≈98–99%) and ensembles remaining precise but often abstaining; the results also show a sub-cycle latency challenge, with average inference around ~60 ms exceeding a 50 ms relay target. By releasing open data and code, the paper provides a reproducible baseline and underscores the need for streaming-aware evaluation to guide deployment of protection strategies in IBR-rich grids.
Abstract
This paper presents a high-fidelity evaluation framework for machine learning (ML)-based classification of cyber-attacks and physical faults using electromagnetic transient simulations with digital substation emulation at 4.8 kHz. Twelve ML models, including ensemble algorithms and a multi-layer perceptron (MLP), were trained on labeled time-domain measurements and evaluated in a real-time streaming environment designed for sub-cycle responsiveness. The architecture incorporates a cycle-length smoothing filter and confidence threshold to stabilize decisions. Results show that while several models achieved near-perfect offline accuracies (up to 99.9%), only the MLP sustained robust coverage (98-99%) under streaming, whereas ensembles preserved perfect anomaly precision but abstained frequently (10-49% coverage). These findings demonstrate that offline accuracy alone is an unreliable indicator of field readiness and underscore the need for realistic testing and inference pipelines to ensure dependable classification in inverter-based resources (IBR)-rich networks.
