Table of Contents
Fetching ...

FreqINR: Frequency Consistency for Implicit Neural Representation with Adaptive DCT Frequency Loss

Meiyi Wei, Liu Xie, Ying Sun, Gang Chen

TL;DR

FreqINR addresses frequency-domain mismatches that cause artifacts in arbitrary-scale SR by enforcing spectral consistency throughout training with Adaptive DCT Frequency Loss (ADFL) and by enlarging the encoder receptive field during inference. It combines a 2D DCT-based frequency representation, a Frequency Distance Matrix, and an Adaptive Frequency Weighting Matrix to dynamically focus on challenging frequencies, integrated into the overall loss as $L_{total} = L_{spatial} + \lambda L_{ADFL}$. An Enhanced Receptive Field encoder extends spectral coverage without significant cost, enabling better high-frequency detail transfer from LR inputs. Empirical results on DIV2K and other benchmarks show consistent PSNR improvements and superior qualitative texture reconstruction, establishing FreqINR as a lightweight yet effective framework for arbitrary-scale SR with potential applicability to other frequency-aware reconstruction tasks.

Abstract

Recent advancements in local Implicit Neural Representation (INR) demonstrate its exceptional capability in handling images at various resolutions. However, frequency discrepancies between high-resolution (HR) and ground-truth images, especially at larger scales, result in significant artifacts and blurring in HR images. This paper introduces Frequency Consistency for Implicit Neural Representation (FreqINR), an innovative Arbitrary-scale Super-resolution method aimed at enhancing detailed textures by ensuring spectral consistency throughout both training and inference. During training, we employ Adaptive Discrete Cosine Transform Frequency Loss (ADFL) to minimize the frequency gap between HR and ground-truth images, utilizing 2-Dimensional DCT bases and focusing dynamically on challenging frequencies. During inference, we extend the receptive field to preserve spectral coherence between low-resolution (LR) and ground-truth images, which is crucial for the model to generate high-frequency details from LR counterparts. Experimental results show that FreqINR, as a lightweight approach, achieves state-of-the-art performance compared to existing Arbitrary-scale Super-resolution methods and offers notable improvements in computational efficiency. The code for our method will be made publicly available.

FreqINR: Frequency Consistency for Implicit Neural Representation with Adaptive DCT Frequency Loss

TL;DR

FreqINR addresses frequency-domain mismatches that cause artifacts in arbitrary-scale SR by enforcing spectral consistency throughout training with Adaptive DCT Frequency Loss (ADFL) and by enlarging the encoder receptive field during inference. It combines a 2D DCT-based frequency representation, a Frequency Distance Matrix, and an Adaptive Frequency Weighting Matrix to dynamically focus on challenging frequencies, integrated into the overall loss as . An Enhanced Receptive Field encoder extends spectral coverage without significant cost, enabling better high-frequency detail transfer from LR inputs. Empirical results on DIV2K and other benchmarks show consistent PSNR improvements and superior qualitative texture reconstruction, establishing FreqINR as a lightweight yet effective framework for arbitrary-scale SR with potential applicability to other frequency-aware reconstruction tasks.

Abstract

Recent advancements in local Implicit Neural Representation (INR) demonstrate its exceptional capability in handling images at various resolutions. However, frequency discrepancies between high-resolution (HR) and ground-truth images, especially at larger scales, result in significant artifacts and blurring in HR images. This paper introduces Frequency Consistency for Implicit Neural Representation (FreqINR), an innovative Arbitrary-scale Super-resolution method aimed at enhancing detailed textures by ensuring spectral consistency throughout both training and inference. During training, we employ Adaptive Discrete Cosine Transform Frequency Loss (ADFL) to minimize the frequency gap between HR and ground-truth images, utilizing 2-Dimensional DCT bases and focusing dynamically on challenging frequencies. During inference, we extend the receptive field to preserve spectral coherence between low-resolution (LR) and ground-truth images, which is crucial for the model to generate high-frequency details from LR counterparts. Experimental results show that FreqINR, as a lightweight approach, achieves state-of-the-art performance compared to existing Arbitrary-scale Super-resolution methods and offers notable improvements in computational efficiency. The code for our method will be made publicly available.
Paper Structure (31 sections, 7 equations, 7 figures, 3 tables)

This paper contains 31 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Main concept of FreqINR. In (b) and (c), bright areas represent strong frequencies, while dark areas indicate weak ones. (a) shows ground-truth and HR images generated by EDSR-baseline-LIIF Chen_2021_CVPR before applying FreqINR, with $\times 4$ (in-distribution) in the first row and $\times 18$ (out-of-distribution) in the second row. (b) illustrates the frequency domain transformation. (c) presents the frequency distances. (d) and (e) display the visual results of our FreqINR at $\times 4$ and $\times 18$ scales, respectively.
  • Figure 2: Overview of FreqINR. The inference process for INR-based models (light blue) is guided by our core component, Adaptive DCT Frequency Loss (dark blue), which leverages the Frequency Distance Matrix (light green) and the Adaptive Frequency Weight Matrix (dark green) to dynamically enhance fine detail learning.
  • Figure 3: Comparison of DCT and DFT distributions. Unlike DFT, DCT places low frequencies in the upper-left corner. (a) The Chequered texture primarily consists of horizontal and vertical details. (b) The Bubbly texture includes both fine and coarse details, reflecting the frequency distribution of natural images. (c) The Noise texture is common and typically needs removal.
  • Figure 4: Visual comparison of INR-based methods at integer and non-integer scales. The first column highlights close-ups in red. All methods use the EDSR-baseline encoder, trained on DIV2K with random scales ranging from $\times 1$ to $\times 4$.
  • Figure 5: Visual comparison of different encoders. All methods use LIIF as the INR-based model during inference.
  • ...and 2 more figures