Table of Contents
Fetching ...

Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

Samrendra Roy, Kazuma Kobayashi, Souvik Chakraborty, Rizwan-uddin, Syed Bahauddin Alam

Abstract

Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.

Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

Abstract

Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative error from 1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension , a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, ) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, ) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.
Paper Structure (80 sections, 9 theorems, 33 equations, 14 figures, 23 tables, 1 algorithm)

This paper contains 80 sections, 9 theorems, 33 equations, 14 figures, 23 tables, 1 algorithm.

Key Result

Theorem 1

Under a $k$-sparse, $\epsilon$-bounded perturbation with the first-order model:

Figures (14)

  • Figure 1: Differential-evolution-driven vulnerability mapping in a cyber-physical twin.Attack Surface (top): Using sensor knowledge $(v_{\mathrm{in}},\,T_{\mathrm{in}},\,q"(z))$ and bounds, Differential Evolution probes the input space and yields a heatmap with vulnerability hotspots at indices 28 and 68 of the 100-point $q"(z)$ profile (Branch 2). All trials respect $\,v_{\mathrm{in}}\in[4.0,5.0]~\mathrm{m\,s^{-1}},\;T_{\mathrm{in}}\in[263,323]~\mathrm{K}\,$ (panel c), enabling stealth. Physical System (middle): Branch 1 encodes $[v_{\mathrm{in}},T_{\mathrm{in}}]$; Branch 2 encodes axial wall heat flux $q"(z)$. Single-coordinate injections ($L_0=1$) enter via sensor/SCADA/preprocessing tampering. Digital Twin (bottom): Surrogates (MIMONet, S-DeepONet, POD-DeepONet, NOMAD) map $G:(b^{(1)},b^{(2)})\!\to\!(P,u,v,w)$; S-DeepONet is most vulnerable. Fields & impact (right): Nominal vs attacked temperature shows hotspot displacement and $34.2\%$ relative $L_2$ error, achieved with black-box access and physics-respecting inputs, revealing dangerous but bounds-compliant failures in learned operator models.
  • Figure 2: Field-level impact of sparse adversarial attacks. Each row shows clean prediction (left), attacked prediction (middle), and absolute error (right). Rows 1--3: S-DeepONet velocity components under $L_0=3$ attack. Row 4: NOMAD pressure field under $L_0=1$ attack. Error maps reveal localized failures with normalized errors approaching 1.0 in safety-critical regions, despite models achieving $<2\%$ error on clean data. White rectangles indicate geometric obstacles.
  • Figure 3: Jacobian sensitivity profiles and DE attack targeting across architectures.Top panels (bars): Mean Jacobian column norms $s_i = \|J_b[:,i]\|_2$ ($\times 10^{-3}$) for each of the 102 input coordinates, averaged over 50 test samples with 30 randomized projections each. Dashed red line marks the Branch 1/Branch 2 boundary (index 2). Middle strips (Sensitivity): Same data as a log-scale heatmap for cross-model comparison. Bottom strips (DE Target): Normalized DE attack targeting frequency, aggregated across $L_0 \in \{1,3,5,10\}$ weighted by successful attacks. S-DeepONet's targeting concentrates at indices 0 and 99--101, matching its Jacobian peaks ($r = 0.99$). POD-DeepONet's apparently diffuse Branch 2 targeting reflects random budget-filling at higher $L_0$ rather than genuine vulnerability: Branch 2 Jacobian norms are ${\sim}10^{-12}$, and targeting is uncorrelated with sensitivity ($r = -0.47$); see Section \ref{['sec:arch_vulnerability']} and Fig. \ref{['fig:supp_heatmaps']} for details.
  • Figure 4: Visualizing adversarial perturbations at $L_0=3$. To illustrate what successful adversarial inputs look like in practice, we plot both a representative single attack (rows 1--2) and averaged statistics across 200 successful attacks (rows 3--5) for each architecture at $L_0=3$ with 20% error threshold. All values are in physical units. Rows 1 and 3 (Branch 1): Inlet velocity $v_{\text{in}}$ (m s$^{-1}$) and temperature $T_{\text{in}}$ (K) before and after perturbation. Rows 2 and 4 (Branch 2): Wall heat flux profiles $q"(z)$ in kW m$^{-2}$ along the 100 axial positions; red circles mark where perturbations were applied in the single examples. Row 5: Perturbation difference ($\Delta q"$) in kW m$^{-2}$ showing where attacks concentrate along the boundary. Interestingly, S-DeepONet attacks cluster at the right boundary (positions 97--99), while MIMONet and NOMAD target both boundaries more symmetrically. A normalized version is provided in Fig. \ref{['fig:supp_perturbation_normalized']}.
  • Figure 5: DE vs. random perturbation baseline. Comparison of DE (blue) and random $k$-sparse perturbation (gray) success rates at the 30% threshold over 310 test samples (50 random trials per sample). Random perturbations achieve $<$9% across all configurations while DE reaches 9--100%, confirming that DE exploits structured vulnerability subspaces rather than inherent input-output instability.
  • ...and 9 more figures

Theorems & Definitions (18)

  • Theorem 1: Upper Bound on $k$-Sparse Attack Error
  • proof
  • Theorem 2: Lower Bound: Achievability
  • proof
  • Definition 3: Sparse Attack Ratio
  • Theorem 4: Single-Point Sparse Attack Advantage
  • proof
  • Theorem 5: Multi-Point Sparse Attack Advantage
  • proof
  • Remark 6: Why $\sqrt{k/d_{\mathrm{eff}}}$ fails for $k > 1$
  • ...and 8 more