Table of Contents
Fetching ...

Mitigating spectral bias for the multiscale operator learning

Xinliang Liu, Bo Xu, Shuhao Cao, Lei Zhang

TL;DR

This work tackles the challenge of spectral bias in neural operators for multiscale PDEs, where low-frequency components are learned preferentially at the expense of high-frequency, multiscale features. It introduces the Hierarchical Attention Neural Operator (HANO), a transformer-based architecture with hierarchical discretization, scale-adaptive interaction ranges, and multilevel self-attention to enable efficient, scalable operator learning with near-linear time complexity. An empirical $H^1$ loss is employed to enhance the learning of high-frequency components, improving fidelity on multiscale solutions. Experiments demonstrate that HANO outperforms state-of-the-art neural operators on representative multiscale problems, indicating substantial potential for fast, accurate forward and inverse PDE mappings in engineering and physics applications.

Abstract

Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs). In this work, we focus on multiscale PDEs that have important applications such as reservoir modeling and turbulence prediction. We demonstrate that for such PDEs, the spectral bias towards low-frequency components presents a significant challenge for existing neural operators. To address this challenge, we propose a hierarchical attention neural operator (HANO) inspired by the hierarchical matrix approach. HANO features a scale-adaptive interaction range and self-attentions over a hierarchy of levels, enabling nested feature computation with controllable linear cost and encoding/decoding of multiscale solution space. We also incorporate an empirical $H^1$ loss function to enhance the learning of high-frequency components. Our numerical experiments demonstrate that HANO outperforms state-of-the-art (SOTA) methods for representative multiscale problems.

Mitigating spectral bias for the multiscale operator learning

TL;DR

This work tackles the challenge of spectral bias in neural operators for multiscale PDEs, where low-frequency components are learned preferentially at the expense of high-frequency, multiscale features. It introduces the Hierarchical Attention Neural Operator (HANO), a transformer-based architecture with hierarchical discretization, scale-adaptive interaction ranges, and multilevel self-attention to enable efficient, scalable operator learning with near-linear time complexity. An empirical loss is employed to enhance the learning of high-frequency components, improving fidelity on multiscale solutions. Experiments demonstrate that HANO outperforms state-of-the-art neural operators on representative multiscale problems, indicating substantial potential for fast, accurate forward and inverse PDE mappings in engineering and physics applications.

Abstract

Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs). In this work, we focus on multiscale PDEs that have important applications such as reservoir modeling and turbulence prediction. We demonstrate that for such PDEs, the spectral bias towards low-frequency components presents a significant challenge for existing neural operators. To address this challenge, we propose a hierarchical attention neural operator (HANO) inspired by the hierarchical matrix approach. HANO features a scale-adaptive interaction range and self-attentions over a hierarchy of levels, enabling nested feature computation with controllable linear cost and encoding/decoding of multiscale solution space. We also incorporate an empirical loss function to enhance the learning of high-frequency components. Our numerical experiments demonstrate that HANO outperforms state-of-the-art (SOTA) methods for representative multiscale problems.
Paper Structure (39 sections, 6 theorems, 36 equations, 3 figures)

This paper contains 39 sections, 6 theorems, 36 equations, 3 figures.

Key Result

Proposition 3.1

Cost $O(N)$ attention

Figures (3)

  • Figure 3.1: quadtree for an image
  • Figure 3.2: Architecture
  • Figure D.1: Eigenvalues distribution for different data sets.

Theorems & Definitions (10)

  • Remark 1
  • Remark 2
  • Remark 3
  • Proposition 3.1
  • Theorem C.1
  • Theorem C.2
  • Theorem C.3
  • Remark 4
  • Theorem F.1: Karhunen-Loeve
  • Theorem F.2: General Mercer's Theorem