Table of Contents
Fetching ...

PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images

Jinsung Jeon, Hyundong Jin, Jonghyun Choi, Sanghyun Hong, Dongeun Lee, Kookjin Lee, Noseong Park

TL;DR

PAC-FNO tackles the problem of visual recognition under varying input resolutions and natural variations by introducing a frequency-domain architecture that operates without a low-pass constraint. It employs AC-FNO blocks arranged in a parallel configuration (PAC-FNO) and a two-stage training algorithm to harmonize with pre-trained backbone models, achieving resolution-invariant processing with a parameter budget of roughly 1–13% of the backbone. Across seven benchmarks and multiple backbones, PAC-FNO delivers large gains on low-quality inputs and demonstrates robustness to noise, weather, and compression artifacts, outperforming resize, super-resolution, and prior FNO baselines (e.g., up to $77.1\%$ relative improvement in some settings). The approach provides a practical, plug-in solution for deploying single-model pipelines that maintain high accuracy across diverse input qualities, with potential for real-world impact in scalable, energy-efficient computer vision systems.

Abstract

A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple models for different resolutions or input variations, these methods are computationally expensive and thus do not scale in practice. To this end, we propose a novel neural network model, parallel-structured and all-component Fourier neural operator (PAC-FNO), that addresses the problem. Unlike conventional feed-forward neural networks, PAC-FNO operates in the frequency domain, allowing it to handle images of varying resolutions within a single model. We also propose a two-stage algorithm for training PAC-FNO with a minimal modification to the original, downstream model. Moreover, the proposed PAC-FNO is ready to work with existing image recognition models. Extensively evaluating methods with seven image recognition benchmarks, we show that the proposed PAC-FNO improves the performance of existing baseline models on images with various resolutions by up to 77.1% and various types of natural variations in the images at inference.

PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images

TL;DR

PAC-FNO tackles the problem of visual recognition under varying input resolutions and natural variations by introducing a frequency-domain architecture that operates without a low-pass constraint. It employs AC-FNO blocks arranged in a parallel configuration (PAC-FNO) and a two-stage training algorithm to harmonize with pre-trained backbone models, achieving resolution-invariant processing with a parameter budget of roughly 1–13% of the backbone. Across seven benchmarks and multiple backbones, PAC-FNO delivers large gains on low-quality inputs and demonstrates robustness to noise, weather, and compression artifacts, outperforming resize, super-resolution, and prior FNO baselines (e.g., up to relative improvement in some settings). The approach provides a practical, plug-in solution for deploying single-model pipelines that maintain high accuracy across diverse input qualities, with potential for real-world impact in scalable, energy-efficient computer vision systems.

Abstract

A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple models for different resolutions or input variations, these methods are computationally expensive and thus do not scale in practice. To this end, we propose a novel neural network model, parallel-structured and all-component Fourier neural operator (PAC-FNO), that addresses the problem. Unlike conventional feed-forward neural networks, PAC-FNO operates in the frequency domain, allowing it to handle images of varying resolutions within a single model. We also propose a two-stage algorithm for training PAC-FNO with a minimal modification to the original, downstream model. Moreover, the proposed PAC-FNO is ready to work with existing image recognition models. Extensively evaluating methods with seven image recognition benchmarks, we show that the proposed PAC-FNO improves the performance of existing baseline models on images with various resolutions by up to 77.1% and various types of natural variations in the images at inference.
Paper Structure (42 sections, 3 equations, 8 figures, 23 tables, 1 algorithm)

This paper contains 42 sections, 3 equations, 8 figures, 23 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of parallel-structured and all-component FNO (PAC-FNO) with existing FNOs. (a) illustrates the vanilla FNOs. Each FNO block contains an ideal low-pass filter, a learnable filter (${\rm R}_\theta$) operating in the frequency domain, and a $1\times1$ convolutional operator ($\mathbf{W}$). FNOs can have a series of these blocks. (b) shows a U-shaped FNO (UNO), connecting the FNO blocks in U-shape. (c) depicts Adaptive FNO, which replaces the learnable filter in the FNO block with adaptive global convolution (AGC). Our PAC-FNO shown in (d) uses all frequency components by removing the low-pass filter of the FNO block and runs forward with the AC-FNO block in parallel. (architecture advances are shown from (a) to (d) in red).
  • Figure 2: Parallel-structured and all-component Fourier neural operator (PAC-FNO) architecture in detail. (a) AC-FNO blocks use all frequency components and rely on Zero-padding and Interpolation to construct the images for the target resolution. (b) Contrary to previous FNOs, PAC-FNO consists of multiple AC-FNO blocks in a parallel manner. $\textbf{h}^{l_{i}}_{l_{j}}$ is $(l_{i},l_{j})$-th hidden vector $(l_{i} \in \{0, \dots ,m\} ,l_{j} \in \{0,\dots,n\})$. In (c), $D_{R}$ is a set of image datasets with different resolutions, $h_f$ is a hidden vector processed from PAC-FNO, and $\hat{y}$ is a predicted class (see §\ref{['subsec:PAC-FNO-block']} and \ref{['subsec:PAC-FNO-layer']} for more details about PAC-FNO).
  • Figure 3: Benefit of parallel structure. We show the top-1 performance of PAC-FNO models with different configurations. $n$ and $m$ are the number of blocks in series and parallel, respectively. Note that the total number of AC-FNO blocks is the same for both configurations. We use ImageNet-1k and ImageNet-C/P (Fog). Compared to serial configuration, our proposed parallel configuration shows less performance degradations at target resolution (39.1% vs. 22.9%)
  • Figure 4: Benefit of two-stage training algorithm. We show the top-1 accuracy of the ablation study of the two-stage algorithm. 'First stage only' refers to a model that was trained only with the first stage, and 'Second stage only' refers to a model that was trained only with the second stage, and 'Two-stage algorithm' refers to a model that was trained by our two-stage training algorithm. We use ResNet-18 in ImageNet-1k.
  • Figure 5: Results for the number of stages $n$ and blocks $m$ in PAC-FNO.
  • ...and 3 more figures