Table of Contents
Fetching ...

The Error of Deep Operator Networks Is the Sum of Its Parts: Branch-Trunk and Mode Error Decompositions

Alexander Heinlein, Johannes Taraz

TL;DR

This work analyzes performance limitations of the classical DeepONet architecture and shows that the approximation error is dominated by the branch network when the internal dimension is sufficiently large, and that the learned trunk basis can often be replaced by classical basis functions without a significant impact on performance.

Abstract

Operator learning has the potential to strongly impact scientific computing by learning solution operators for differential equations, potentially accelerating multi-query tasks such as design optimization and uncertainty quantification by orders of magnitude. Despite proven universal approximation properties, deep operator networks (DeepONets) often exhibit limited accuracy and generalization in practice, which hinders their adoption. Understanding these limitations is therefore crucial for further advancing the approach. This work analyzes performance limitations of the classical DeepONet architecture. It is shown that the approximation error is dominated by the branch network when the internal dimension is sufficiently large, and that the learned trunk basis can often be replaced by classical basis functions without a significant impact on performance. To investigate this further, a modified DeepONet is constructed in which the trunk network is replaced by the left singular vectors of the training solution matrix. This modification yields several key insights. First, a spectral bias in the branch network is observed, with coefficients of dominant, low-frequency modes learned more effectively. Second, due to singular-value scaling of the branch coefficients, the overall branch error is dominated by modes with intermediate singular values rather than the smallest ones. Third, using a shared branch network for all mode coefficients, as in the standard architecture, improves generalization of small modes compared to a stacked architecture in which coefficients are computed separately. Finally, strong and detrimental coupling between modes in parameter space is identified.

The Error of Deep Operator Networks Is the Sum of Its Parts: Branch-Trunk and Mode Error Decompositions

TL;DR

This work analyzes performance limitations of the classical DeepONet architecture and shows that the approximation error is dominated by the branch network when the internal dimension is sufficiently large, and that the learned trunk basis can often be replaced by classical basis functions without a significant impact on performance.

Abstract

Operator learning has the potential to strongly impact scientific computing by learning solution operators for differential equations, potentially accelerating multi-query tasks such as design optimization and uncertainty quantification by orders of magnitude. Despite proven universal approximation properties, deep operator networks (DeepONets) often exhibit limited accuracy and generalization in practice, which hinders their adoption. Understanding these limitations is therefore crucial for further advancing the approach. This work analyzes performance limitations of the classical DeepONet architecture. It is shown that the approximation error is dominated by the branch network when the internal dimension is sufficiently large, and that the learned trunk basis can often be replaced by classical basis functions without a significant impact on performance. To investigate this further, a modified DeepONet is constructed in which the trunk network is replaced by the left singular vectors of the training solution matrix. This modification yields several key insights. First, a spectral bias in the branch network is observed, with coefficients of dominant, low-frequency modes learned more effectively. Second, due to singular-value scaling of the branch coefficients, the overall branch error is dominated by modes with intermediate singular values rather than the smallest ones. Third, using a shared branch network for all mode coefficients, as in the standard architecture, improves generalization of small modes compared to a stacked architecture in which coefficients are computed separately. Finally, strong and detrimental coupling between modes in parameter space is identified.
Paper Structure (61 sections, 41 equations, 12 figures, 1 algorithm)

This paper contains 61 sections, 41 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Visualizations of the standard DeepONet (top left) and this work's main contributions: the decomposition into trunk and branch error (top right) and the modified DeepONet's architecture (bottom left) which allows us to further investigate the dominant branch error via the mode loss decomposition (bottom right). The visualizations of the architectures are adapted and reproduced from deeponet.
  • Figure 2: Relative total, trunk and branch errors of DeepONets with various bases plotted over $N$.Colors: Purple lines: learned trunk (standard DeepONet), orange: SVD basis, green: Legendre polynomials, pink: Chebyshev polynomials, brown: cosine. Line-symbols: Total error $\delta$ (circle), trunk error $\delta_T$ (dashed), branch error $\delta_B$ (solid). The relative error is defined in the glossary, \ref{['tab:notation']}.
  • Figure 3: Weighted mode losses for DeepONet trained with GD. Unweighted (A,B) and weighted (C,D) training (A,C) and test (B,D) mode losses at different training stages, colored from gray (initial) to red/blue (final) shown as dashed lines. The respective base losses are shown as black dot-solid lines.
  • Figure 4: Model performance across different exponents $e$ and training epochs for DeepONets trained using GD.Top panel: Relative error $\delta = ||A- \Tilde{A}||_F/||A||_F$ for both training (dashed lines) and test (circles) data across different exponents ($e = -1.0, -0.5, 0.0, 0.5, 1.0$) over $4000.0$ epochs. Center and bottom row: Weighted training (center row) and test (bottom row) mode losses at different training stages, colored from gray (initial) to red/blue (final). Each column corresponds to a different exponent $e$. The third column shows the DeepONet trained using the standard loss ($e=0$). The plots in the center and bottom row contain the respective base losses in black, and a pink dashed horizontal line marking the maximum mode loss in the last training epoch of $e=0$, facilitating comparison between different exponents $e$.
  • Figure 5: Weighted mode losses for DeepONet trained with Adam. Unweighted (A,B) and weighted (C,D) training (A,C) and test (B,D) mode losses at different training stages, colored from gray (initial) to red/blue (final) shown as dashed lines. The respective base losses are shown as black dot-solid lines.
  • ...and 7 more figures

Theorems & Definitions (6)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6