Table of Contents
Fetching ...

Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning

Luis Mandl, Somdatta Goswami, Lena Lambers, Tim Ricken

TL;DR

The paper tackles the curse of dimensionality in physics-informed DeepONet (PI-DeepONet) when solving high-dimensional PDEs. It introduces Separable PI-DeepONet (Sep-PI-DeepONet), which factorizes coordinates into separate 1D trunk networks and uses forward-mode automatic differentiation to compute PDE gradients, achieving linear scaling with the discretization density $n$ in a $d$-dimensional setting. The authors validate the approach on Burgers' equation, Biot consolidation, a parameterized heat equation, and a Poisson problem with a random field, showing comparable accuracy to the vanilla PI-DeepONet while realizing orders-of-magnitude speedups and enabling training where the standard method is impractical. Overall, Sep-PI-DeepONet offers a practical, scalable framework for physics-informed neural operators in high-dimensional PDE contexts, broadening their applicability to complex scientific and engineering problems.

Abstract

The deep operator network (DeepONet) is a popular neural operator architecture that has shown promise in solving partial differential equations (PDEs) by using deep neural networks to map between infinite-dimensional function spaces. In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet. This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization. In this paper, we introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs. Our approach involves a factorization technique where sub-networks handle individual one-dimensional coordinates, thereby reducing the number of forward passes and the size of the Jacobian matrix. By using forward-mode automatic differentiation, we further optimize the computational cost related to the Jacobian matrix. As a result, our modifications lead to a linear scaling of computational cost with discretization density, making Separable DeepONet suitable for high-dimensional PDEs. We validate the effectiveness of the separable architecture through three benchmark PDE models: the viscous Burgers equation, Biot's consolidation theory, and a parametrized heat equation. In all cases, our proposed framework achieves comparable or improved accuracy while significantly reducing computational time compared to conventional DeepONet. These results demonstrate the potential of Separable DeepONet in efficiently solving complex, high-dimensional PDEs, advancing the field of physics-informed machine learning.

Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning

TL;DR

The paper tackles the curse of dimensionality in physics-informed DeepONet (PI-DeepONet) when solving high-dimensional PDEs. It introduces Separable PI-DeepONet (Sep-PI-DeepONet), which factorizes coordinates into separate 1D trunk networks and uses forward-mode automatic differentiation to compute PDE gradients, achieving linear scaling with the discretization density in a -dimensional setting. The authors validate the approach on Burgers' equation, Biot consolidation, a parameterized heat equation, and a Poisson problem with a random field, showing comparable accuracy to the vanilla PI-DeepONet while realizing orders-of-magnitude speedups and enabling training where the standard method is impractical. Overall, Sep-PI-DeepONet offers a practical, scalable framework for physics-informed neural operators in high-dimensional PDE contexts, broadening their applicability to complex scientific and engineering problems.

Abstract

The deep operator network (DeepONet) is a popular neural operator architecture that has shown promise in solving partial differential equations (PDEs) by using deep neural networks to map between infinite-dimensional function spaces. In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet. This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization. In this paper, we introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs. Our approach involves a factorization technique where sub-networks handle individual one-dimensional coordinates, thereby reducing the number of forward passes and the size of the Jacobian matrix. By using forward-mode automatic differentiation, we further optimize the computational cost related to the Jacobian matrix. As a result, our modifications lead to a linear scaling of computational cost with discretization density, making Separable DeepONet suitable for high-dimensional PDEs. We validate the effectiveness of the separable architecture through three benchmark PDE models: the viscous Burgers equation, Biot's consolidation theory, and a parametrized heat equation. In all cases, our proposed framework achieves comparable or improved accuracy while significantly reducing computational time compared to conventional DeepONet. These results demonstrate the potential of Separable DeepONet in efficiently solving complex, high-dimensional PDEs, advancing the field of physics-informed machine learning.
Paper Structure (12 sections, 1 theorem, 16 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 1 theorem, 16 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Theorem S1.1

Suppose that $X$ is a Banach space, $K_1 \subset X$, $K_2 \subset \mathbb{R}^d$ are two compact sets in $X$ and $\mathbb{R}^d$, respectively, $V$ is a compact set in $C(K_1)$. Assume that: $\mathcal{G}: V \rightarrow C(K_2)$ is a nonlinear continuous operator. Then, for any $\epsilon > 0$, there exi holds for all $\mathbf{x} \in V$ and $\zeta \in K_2$, where $\langle \cdot, \cdot \rangle$ denotes

Figures (11)

  • Figure 1: The framework of physics-informed separable DeepONet demonstrated for the parametrized heat equation example. Its central component is the outer product over the individual batches in the inputs followed by the summations indicated by $\bigotimes\sum$. This is done over the tensor rank $r$ in the trunk as well as the output over the hidden latent dimension $p$. Input to the trunk networks are factorizable coordinates and parameters. A detailed representation of the network structure including the batches is shown in Figure \ref{['fig:sep_framework']} in the Supplementary Materials. The framework employs forward-mode automatic differentiation to compute the derivative terms in the PDE.
  • Figure 2: Comparative results of vanilla PI-DeepONet and Sep-PI-DeepONet for all applications considered in this work, evaluated after fixed training time. (a) For the Burgers' equation (1D in space and 1D in time), after training for 83.69s: vanilla completed 600 iterations with $\mathcal{L}_2$ error of $3.82 \times 10^{-1}$, while separable completed 21,500 iterations with error $8.98 \times 10^{-2}$. (b) For the consolidation problem (1D in space and 1D in time), after training for 380.39s: vanilla completed 2,200 iterations achieving an error $4.11 \times 10^{-1}$ in the displacement field and an error of $9.36 \times 10^{-1}$ in the pressure field, while separable completed 95,500 iterations while achieving an error of $2.63 \times 10^{-2}$ in the displacement field and an error of $1.35 \times 10^{-1}$ in the pressure field. Networks for both problems have comparable trainable parameters. (c) For the parameterized heat equation (2D in space, 1D in parameter values, and 1D in time), the separable architecture was evaluated after convergence (approx. 2.5h training time). The training could not be completed with the vanilla framework as each iteration required 10,416.7 mili-seconds (approx. 289.35h training time) (see Table \ref{['tab:results']}). (d) The separable framework is easily transferable to other inputs such as random sources for Poisson's equation and can be combined with other architectures, e.g., a convolutional neural network as a branch network.
  • Figure 3: Comparative analysis of network architectures for the Burgers' equation. Top row: Training loss trajectories. Bottom row: Relative $\mathcal{L}_2$ error. Left: Metrics plotted against epochs. Right: Metrics plotted against computational time. While the left plots demonstrate that the convergence of all the experimental setups is similar the right plot shows that the computational time is drastically reduced in the separable architecture. All network variants listed in Table \ref{['tab:burgers_result']} are represented.
  • Figure 4: Burgers' equation: Performance comparison between reference vanilla PI-DeepONet (6 hidden layers, 100 neurons each, $p= 100$, resulting in 131,701 trainable parameters) and separable PI-DeepONet (6 hidden layers, 50 neurons each, $p = r = 20$, resulting in 129,221 trainable parameters) for a representative test case. Top row: Predicted solutions after 50,000 epochs. Bottom row: Squared difference between predictions and reference solution. Note the comparable accuracy as indicated by the provided relative $\mathcal{L}_2$ error for both variants despite the separable architecture's reduced complexity.
  • Figure 5: Influence of combinations of hidden dimension $p$ and tensor rank $r$ for Burgers' equation using Sep-PI-DeepONet with 6 trunk layers of 100 neurons. We tracked the iteration with the lowest achieved mean relative $\mathcal{L}_2$ error for each combination over training for 50,000 epochs. Additionally shown are the best and worst test examples for the respective epoch.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem S1.1: Generalized Universal Approximation Theorem for Operators.