Table of Contents
Fetching ...

Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution

Xihaier Luo, Xiaoning Qian, Byung-Jun Yoon

TL;DR

This work addresses arbitrary-scale super-resolution for scientific data by recasting SR as learning a mapping between function spaces with a hierarchical neural operator transformer, HiNOTE. The architecture combines a hybrid upsampling encoder, a parameter-free sampler, and a neural-operator decoder with Galerkin-type self-attention to realize a parametric map $oldsymbol{G}_ heta:oldsymbol{ ext{A}} ooldsymbol{ ext{B}}$ usable at arbitrary resolutions. A learnable frequency-aware loss prior based on spectral resizing $oldsymbol{ ext{R}}(oldsymbol{a})$ reweights pixel contributions to better capture high-frequency details. Across turbulence, weather, SEVIR, and MRI datasets, HiNOTE achieves state-of-the-art arbitrary-scale SR performance and demonstrates notable improvements in high-frequency fidelity, signaling significant potential for high-precision scientific imaging and modeling.

Abstract

In this work, we present an arbitrary-scale super-resolution (SR) method to enhance the resolution of scientific data, which often involves complex challenges such as continuity, multi-scale physics, and the intricacies of high-frequency signals. Grounded in operator learning, the proposed method is resolution-invariant. The core of our model is a hierarchical neural operator that leverages a Galerkin-type self-attention mechanism, enabling efficient learning of mappings between function spaces. Sinc filters are used to facilitate the information transfer across different levels in the hierarchy, thereby ensuring representation equivalence in the proposed neural operator. Additionally, we introduce a learnable prior structure that is derived from the spectral resizing of the input data. This loss prior is model-agnostic and is designed to dynamically adjust the weighting of pixel contributions, thereby balancing gradients effectively across the model. We conduct extensive experiments on diverse datasets from different domains and demonstrate consistent improvements compared to strong baselines, which consist of various state-of-the-art SR methods.

Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution

TL;DR

This work addresses arbitrary-scale super-resolution for scientific data by recasting SR as learning a mapping between function spaces with a hierarchical neural operator transformer, HiNOTE. The architecture combines a hybrid upsampling encoder, a parameter-free sampler, and a neural-operator decoder with Galerkin-type self-attention to realize a parametric map usable at arbitrary resolutions. A learnable frequency-aware loss prior based on spectral resizing reweights pixel contributions to better capture high-frequency details. Across turbulence, weather, SEVIR, and MRI datasets, HiNOTE achieves state-of-the-art arbitrary-scale SR performance and demonstrates notable improvements in high-frequency fidelity, signaling significant potential for high-precision scientific imaging and modeling.

Abstract

In this work, we present an arbitrary-scale super-resolution (SR) method to enhance the resolution of scientific data, which often involves complex challenges such as continuity, multi-scale physics, and the intricacies of high-frequency signals. Grounded in operator learning, the proposed method is resolution-invariant. The core of our model is a hierarchical neural operator that leverages a Galerkin-type self-attention mechanism, enabling efficient learning of mappings between function spaces. Sinc filters are used to facilitate the information transfer across different levels in the hierarchy, thereby ensuring representation equivalence in the proposed neural operator. Additionally, we introduce a learnable prior structure that is derived from the spectral resizing of the input data. This loss prior is model-agnostic and is designed to dynamically adjust the weighting of pixel contributions, thereby balancing gradients effectively across the model. We conduct extensive experiments on diverse datasets from different domains and demonstrate consistent improvements compared to strong baselines, which consist of various state-of-the-art SR methods.
Paper Structure (31 sections, 14 equations, 14 figures, 9 tables)

This paper contains 31 sections, 14 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Overview of the Hierarchical Neural Operator TransformEr (HiNOTE). HiNOTE features a structured architecture comprising three key components: Firstly, an encoder designed for upsampling deep features; secondly, a sampler tasked with rendering a specific set of features; and thirdly, a decoder capable of making inferences at various arbitrary points within the domain.
  • Figure 2: Illustration of the parameter-free sampler. It samples arbitrary resolutions from feature maps extracted by the encoder and combines the positional information of the grid points.
  • Figure 3: Distinguishing between pixels of different frequency regions in image space poses a challenge. Deep learning predictions often show high visual perception metrics when compared with target HR images (See the left representations). We analyze the images in the frequency domain and observe that the power spectra of HR images and those produced by deep learning models begin to diverge at a certain frequency (e.g., 0.2 in this example). To identify this frequency, spectral resizing is applied to LR inputs, revealing a clear demarcation in frequency regions. This demarcation aligns with the frequency divergence. Building on this, we introduce a static structure prior, created by subtracting low-frequency signals (obtained via spectral resizing) from deep learning predictions in the image space. This prior is then utilized to refine and enhance the network training process.
  • Figure 4: Qualitative comparison with state-of-the-art (SOTA) methods for arbitrary-scale SR. Top-left: turbulence flow; Top-right: kinetic energy; Bottom-left: temperature; and Bottom-right: water vapor.
  • Figure 5: Qualitative demonstration of continuous representation learning: model performance evaluated on two instances randomly chosen from the test dataset, across various upsampling ratios.
  • ...and 9 more figures