Table of Contents
Fetching ...

High-Fidelity Digital Twin Dataset Generation for Inverter-Based Microgrids Under Multi-Scenario Disturbances

Osasumwen Cedric Ogiesoba-Eguakun, Kaveh Ashenayi, Suman Rath

TL;DR

This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators, providing a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids.

Abstract

Public power-system datasets often lack electromagnetic transient (EMT) waveforms, inverter control dynamics, and diverse disturbance coverage, which limits their usefulness for training surrogate models and studying cyber-physical behavior in inverter-based microgrids. This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators. The dataset records synchronized three-phase PCC voltages and currents, per-DG active power, reactive power, and frequency, together with embedded scenario labels, producing 38 aligned channels sampled at $Δt = 2~μ$s over $T = 1$~s ($N = 500{,}001$ samples) per scenario. Eleven operating and disturbance scenarios are included: normal operation, load step, voltage sag (temporary three-phase fault), load ramp, frequency ramp, DG trip, tie-line trip, reactive power step, single-line-to-ground faults, measurement noise injection, and communication delay. To ensure numerical stability without altering sequence length, invalid samples (NaN, Inf, and extreme outliers) are repaired using linear interpolation. Each scenario is further validated using system-level evidence from mean frequency, PCC voltage magnitude, total active power, voltage unbalance, and zero-sequence current to confirm physical observability and correct timing. The resulting dataset provides a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids. The dataset and processing scripts will be released upon acceptance

High-Fidelity Digital Twin Dataset Generation for Inverter-Based Microgrids Under Multi-Scenario Disturbances

TL;DR

This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators, providing a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids.

Abstract

Public power-system datasets often lack electromagnetic transient (EMT) waveforms, inverter control dynamics, and diverse disturbance coverage, which limits their usefulness for training surrogate models and studying cyber-physical behavior in inverter-based microgrids. This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators. The dataset records synchronized three-phase PCC voltages and currents, per-DG active power, reactive power, and frequency, together with embedded scenario labels, producing 38 aligned channels sampled at s over ~s ( samples) per scenario. Eleven operating and disturbance scenarios are included: normal operation, load step, voltage sag (temporary three-phase fault), load ramp, frequency ramp, DG trip, tie-line trip, reactive power step, single-line-to-ground faults, measurement noise injection, and communication delay. To ensure numerical stability without altering sequence length, invalid samples (NaN, Inf, and extreme outliers) are repaired using linear interpolation. Each scenario is further validated using system-level evidence from mean frequency, PCC voltage magnitude, total active power, voltage unbalance, and zero-sequence current to confirm physical observability and correct timing. The resulting dataset provides a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids. The dataset and processing scripts will be released upon acceptance
Paper Structure (41 sections, 17 equations, 12 figures, 3 tables)

This paper contains 41 sections, 17 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Single-line diagram of the inverter-based microgrid digital twin used for dataset generation. The utility grid is connected at the PCC through a tie-line breaker to enable grid-connected and islanded operation. Ten identical inverter-based DG units (DG1--DG10) connect to the main AC bus, supplying static resistor–inductor–capacitor (RLC) and dynamic loads. Logged measurements include PCC three-phase voltage and current and per-DG active power, reactive power, and frequency.
  • Figure 2: Dataset generation and validation pipeline. EMT simulations are performed in Simulink with fixed disturbance timing. Synchronized measurements are logged, exported to CSV, cleaned for numerical stability, and validated using system-level signals before release for surrogate modeling.
  • Figure 3: Load-step scenario validation and physical observability. The top panel shows total DG active power with an inset zoom highlighting the detected load step at approximately 0.7 s. The lower panel shows the mean DG frequency response, indicating an immediate dip and recovery consistent with droop control. Insets provide RMS current envelope and PCC phase-voltage waveforms around the step, confirming coordinated electromagnetic and control-layer responses.
  • Figure 4: Voltage-sag scenario validation and physical observability. The top panel shows the PCC voltage-magnitude proxy (raw and smoothed) with an inset zoom highlighting the detected sag window, voltage drop, and recovery behavior. The lower-left panel presents the corresponding total DG active power trajectory. The lower-right panel shows the three-phase PCC voltages within the sag window. The temporal alignment across voltage and power signals confirms that the disturbance produces a coordinated system-level dynamic response.
  • Figure 5: Load-ramp scenario validation and system-level observability. The top panel shows the total DG active power ($P_{\text{total}}$) in raw and smoothed form. The dashed vertical lines mark the ramp start ($t\!=\!0.5$ s) and ramp end ($t\!=\!0.7$ s). The inset zoom highlights the ramp interval and reports the estimated ramp magnitude $\Delta P$ and duration $T_r$. The bottom panel shows the frequency response of representative DG units during the same time window. The inset shows the slope $dP_{\text{total}}/dt$, confirming a sustained positive power ramp between 0.5--0.7 s.
  • ...and 7 more figures