Resolving gradient pathology in physics-informed epidemiological models

Nickson Golooba; Woldegebriel Assefa Woldegerima

Resolving gradient pathology in physics-informed epidemiological models

Nickson Golooba, Woldegebriel Assefa Woldegerima

Abstract

Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine similarity between the data and physics gradients to dynamically modulate the penalty weight. Unlike standard annealing schemes that only normalize scales, CGGS acts as a geometric gate: it suppresses the physical constraint when directional conflict is high, allowing the optimizer to prioritize data fidelity, and restores the constraint when gradients align. We prove that this gating mechanism preserves the standard $O(1/T)$ convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods.

Resolving gradient pathology in physics-informed epidemiological models

Abstract

convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods.

Paper Structure (26 sections, 4 theorems, 21 equations, 4 figures, 1 algorithm)

This paper contains 26 sections, 4 theorems, 21 equations, 4 figures, 1 algorithm.

Introduction
Mathematical formulation
The compartmental constraint (ODE)
The logical constraint (discrete knowledge)
The unified optimization problem
Gradient pathology and spectral analysis
The gradient conflict regimes
Pareto stationarity and deadlock
Proposed method: Conflict-Gated Gradient Scaling (CGGS)
The update rule
Differentiation from prior art
Design choices and stability
Convergence analysis
Boundedness of the adaptive weight
Descent direction under gradient conflict
...and 11 more sections

Key Result

Proposition 3.1

Let $\mathbf{g}_{data}$ and $\mathbf{g}_{phy}$ be non-zero gradient vectors satisfying $\mathbf{g}_{data} = -c\, \mathbf{g}_{phy}$ for some $c > 0$. Then the magnitude-balanced weight yields a zero update: The optimizer perceives this as a stationary point and halts, despite both $\mathcal{L}_{data}$ and $\mathcal{L}_{ODE}$ remaining large.

Figures (4)

Figure 1: Conceptual visualization of CGGS. (Left) The data and physics gradients conflict (opposing directions). (Center) Standard Magnitude Balancing (LRA) equalizes the lengths but ignores the angle. The resultant update vector (black) is minimized, leading to optimization stagnation ("Deadlock"). (Right) CGGS detects the negative cosine similarity and "gates" (shrinks) the physics gradient. The resultant update vector (green) follows the data gradient, allowing the optimizer to escape the local minimum.
Figure 2: Baseline analysis of a standard PINN training on noisy SEIR data. (Left) The model overfits the noise (blue solid curve) and fails to capture well the true dynamics. (Right) The cosine similarity between data and physics gradients frequently drops below zero (dashed line), indicating destructive optimization conflict where the objectives fight each other.
Figure 3: Performance of the proposed CGGS. (Left) The model recovers the true SEIR trajectory (green solid curve) despite noise. (Center) The adaptive weight $\hat{\lambda}$ shows the distinct "relaxation-refinement" phases. (Right) The gating mechanism successfully manages periods of gradient conflict (negative cosine similarity) by "closing the gate" (similarity lines jumping back up).
Figure 4: Ablation study. Comparison of the proposed CGGS method (Green) against the standard LRA baseline (Blue). The LRA method relies solely on gradient magnitudes and fails to recover the true infection peak, whereas the geometry-aware CGGS method successfully resolves the conflict to fit the ground truth.

Theorems & Definitions (13)

Proposition 3.1: Pareto deadlock under fixed weights
proof
Lemma 4.4: Uniform Boundedness
proof
Lemma 4.5: Sufficient descent
proof
Remark 4.6
Theorem 4.7: Convergence of CGGS
proof
Remark 4.8: EMA momentum
...and 3 more

Resolving gradient pathology in physics-informed epidemiological models

Abstract

Resolving gradient pathology in physics-informed epidemiological models

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (13)