Table of Contents
Fetching ...

DiffFNO: Diffusion Fourier Neural Operator

Xiaoyi Liu, Hao Tang

TL;DR

DiffFNO tackles the challenge of arbitrary-scale image super-resolution by fusing spectral reasoning from Weighted Fourier Neural Operators with local spatial modeling from AttnNO, connected through a Gated Fusion Mechanism. A diffusion-inspired refinement framework, equipped with Adaptive Time-Step ODE solving, enables efficient and high-quality reconstruction across scales not seen during training. The key innovations—Mode Rebalancing to emphasize high-frequency details, dynamic fusion of global and local features, and nonuniform, learned time stepping—collectively yield state-of-the-art PSNR improvements (≈2–4 dB) with faster inference. This approach offers practical gains for real-world SR tasks requiring high fidelity across a continuum of upsampling factors, including out-of-distribution scales.

Abstract

We introduce DiffFNO, a novel diffusion framework for arbitrary-scale super-resolution strengthened by a Weighted Fourier Neural Operator (WFNO). Mode Rebalancing in WFNO effectively captures critical frequency components, significantly improving the reconstruction of high-frequency image details that are crucial for super-resolution tasks. Gated Fusion Mechanism (GFM) adaptively complements WFNO's spectral features with spatial features from an Attention-based Neural Operator (AttnNO). This enhances the network's capability to capture both global structures and local details. Adaptive Time-Step (ATS) ODE solver, a deterministic sampling strategy, accelerates inference without sacrificing output quality by dynamically adjusting integration step sizes ATS. Extensive experiments demonstrate that DiffFNO achieves state-of-the-art (SOTA) results, outperforming existing methods across various scaling factors by a margin of 2-4 dB in PSNR, including those beyond the training distribution. It also achieves this at lower inference time. Our approach sets a new standard in super-resolution, delivering both superior accuracy and computational efficiency.

DiffFNO: Diffusion Fourier Neural Operator

TL;DR

DiffFNO tackles the challenge of arbitrary-scale image super-resolution by fusing spectral reasoning from Weighted Fourier Neural Operators with local spatial modeling from AttnNO, connected through a Gated Fusion Mechanism. A diffusion-inspired refinement framework, equipped with Adaptive Time-Step ODE solving, enables efficient and high-quality reconstruction across scales not seen during training. The key innovations—Mode Rebalancing to emphasize high-frequency details, dynamic fusion of global and local features, and nonuniform, learned time stepping—collectively yield state-of-the-art PSNR improvements (≈2–4 dB) with faster inference. This approach offers practical gains for real-world SR tasks requiring high fidelity across a continuum of upsampling factors, including out-of-distribution scales.

Abstract

We introduce DiffFNO, a novel diffusion framework for arbitrary-scale super-resolution strengthened by a Weighted Fourier Neural Operator (WFNO). Mode Rebalancing in WFNO effectively captures critical frequency components, significantly improving the reconstruction of high-frequency image details that are crucial for super-resolution tasks. Gated Fusion Mechanism (GFM) adaptively complements WFNO's spectral features with spatial features from an Attention-based Neural Operator (AttnNO). This enhances the network's capability to capture both global structures and local details. Adaptive Time-Step (ATS) ODE solver, a deterministic sampling strategy, accelerates inference without sacrificing output quality by dynamically adjusting integration step sizes ATS. Extensive experiments demonstrate that DiffFNO achieves state-of-the-art (SOTA) results, outperforming existing methods across various scaling factors by a margin of 2-4 dB in PSNR, including those beyond the training distribution. It also achieves this at lower inference time. Our approach sets a new standard in super-resolution, delivering both superior accuracy and computational efficiency.

Paper Structure

This paper contains 10 sections, 17 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The proposed Diffusion Fourier Neural Opeartor (DiffFNO) architecture for arbitrary-scale super-resolution begins by lifting a low-resolution input image $\mathbf{x}_\text{LR}(\mathbf{r})$ into a feature space using a convolutional encoder. Features extracted by the Weighted Fourier Neural Operator (WFNO) and an Attention-based Neural Operator (AttnNO) are combined using a Gated Fusion Mechanism (GFM). The fused features are then projected into RGB space, where Adaptive Time-Step (ATS) ODE solver efficiently completes the reverse diffusion process with both accuracy and speed. This pipeline generates $\mathbf{x}_\text{HR}(\mathbf{r})$, a high-resolution version of the input image.
  • Figure 2: Qualitative comparison on integer and continuous super-resolution scales. The models use RDN RDN as their encoder (except HiNOTE SR-HiNOTE, has its own). In the HR image, the cropped patch is outlined in green.