Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

Youngsik Hwang; Dong-Young Lim

Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

Youngsik Hwang, Dong-Young Lim

TL;DR

This work identifies that training instability in physics-informed neural networks (PINNs) can arise when the PDE residual and boundary loss gradients are imbalanced or negatively aligned. It introduces Dual Cone Gradient Descent (DCGD), which updates gradients to lie in a dual cone region defined by nonnegative inner products with both loss gradients, and provides a nonconvex convergence analysis to Pareto-stationary points. The authors present three DCGD variants—Projection, Average, and Center—and prove their inclusion in the dual cone and convergence guarantees, showing strong empirical performance on classical and high-dimensional PDEs, as well as compatibility with loss-balancing and NTK techniques. Overall, DCGD offers a principled, scalable approach to multi-objective PINN optimization with broad applicability and improved training stability.

Abstract

Physics-informed neural networks (PINNs) have emerged as a prominent approach for solving partial differential equations (PDEs) by minimizing a combined loss function that incorporates both boundary loss and PDE residual loss. Despite their remarkable empirical performance in various scientific computing tasks, PINNs often fail to generate reasonable solutions, and such pathological behaviors remain difficult to explain and resolve. In this paper, we identify that PINNs can be adversely trained when gradients of each loss function exhibit a significant imbalance in their magnitudes and present a negative inner product value. To address these issues, we propose a novel optimization framework, Dual Cone Gradient Descent (DCGD), which adjusts the direction of the updated gradient to ensure it falls within a dual cone region. This region is defined as a set of vectors where the inner products with both the gradients of the PDE residual loss and the boundary loss are non-negative. Theoretically, we analyze the convergence properties of DCGD algorithms in a non-convex setting. On a variety of benchmark equations, we demonstrate that DCGD outperforms other optimization algorithms in terms of various evaluation metrics. In particular, DCGD achieves superior predictive accuracy and enhances the stability of training for failure modes of PINNs and complex PDEs, compared to existing optimally tuned models. Moreover, DCGD can be further improved by combining it with popular strategies for PINNs, including learning rate annealing and the Neural Tangent Kernel (NTK).

Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

TL;DR

Abstract

Paper Structure (41 sections, 5 theorems, 63 equations, 16 figures, 7 tables, 6 algorithms)

This paper contains 41 sections, 5 theorems, 63 equations, 16 figures, 7 tables, 6 algorithms.

Introduction
Preliminaries
Notation.
Related Works.
Physics-Informed Neural Networks.
Empirical Observations and Issues in Training PINNs
Conflicting and dominating gradients in PINNs.
Methodology
Dual Cone Region
Convergence Analysis
Dual Cone Gradient Descent: Projection, Average, and Center
Benefits of the DCGD framework
Numerical Experiment
Comparison on benchmark equations
Failure model of PINNs and Complex P(I)DEs
...and 26 more sections

Key Result

Theorem 4.2

Suppose that $\nabla \mathcal{L}_r(\theta_t)$ and $\nabla \mathcal{L}_b(\theta_t)$ are given at each iteration $t$. Let $\phi_t$ be the angle between $\nabla \mathcal{L}_r(\theta_t)$ and $\nabla \mathcal{L}_b(\theta_t)$, and $R = \frac{\|\nabla \mathcal{L}_r(\theta_t)\|}{\|\nabla \mathcal{L}_b(\thet

Figures (16)

Figure 1: Training curves for the total loss $\mathcal{L}$$(:=\mathcal{L}_r+\mathcal{L}_b)$, PDE residual loss $\mathcal{L}_r$, and boundary loss $\mathcal{L}_b$ for viscous Burgers' equation.
Figure 2: Conflicting and dominating gradients in PINNs. Here, $\phi$ is defined as the angle between $\nabla \mathcal{L}_r$ and $\nabla \mathcal{L}_b$, $R = \frac{\|\nabla \mathcal{L}_r\|}{ \|\nabla \mathcal{L}_b\|}$ is the magnitude ratio between gradients.
Figure 3: Visualization of dual cone region ${\mathbf{K}}_t^*$ and its subspace ${\mathbf{G}}_t$
Figure 4: The updated gradient $g_t^{\text{dual}}$ of three DCGD algorithms.
Figure 5: Distribution of $\cos(\varphi_t^{\max})$ for each algorithm with $\varphi_t^{\text{max}} = \max\{\varphi_t^{r}, \varphi_t^{b}\}$ where $\varphi_t^{r}$ is the angle between the updated vector and $\nabla \mathcal{L}_r(\theta_t)$, and $\varphi_t^{b}$ is the angle between the updated vector and $\nabla \mathcal{L}_b(\theta_t)$.
...and 11 more figures

Theorems & Definitions (8)

Definition 4.1
Theorem 4.2
Proposition 4.3
Definition 4.4
Theorem 4.5
Proposition 4.6
Corollary 4.7
Remark A.1

Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

TL;DR

Abstract

Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (8)