Table of Contents
Fetching ...

Misalignment-Robust Frequency Distribution Loss for Image Transformation

Zhangkai Ni, Juncheng Wu, Zian Wang, Wenhan Yang, Hanli Wang, Lin Ma

TL;DR

A novel and simple frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain using Discrete Fourier Transformation (DFT), which is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.

Abstract

This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments. However, creating precisely aligned paired images presents significant challenges and hinders the advancement of methods trained on such data. To overcome this challenge, this paper introduces a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Specifically, we transform image features into the frequency domain using Discrete Fourier Transformation (DFT). Subsequently, frequency components (amplitude and phase) are processed separately to form the FDL loss function. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain. Extensive experimental evaluations, focusing on image enhancement and super-resolution tasks, demonstrate that FDL outperforms existing misalignment-robust loss functions. Furthermore, we explore the potential of our FDL for image style transfer that relies solely on completely misaligned data. Our code is available at: https://github.com/eezkni/FDL

Misalignment-Robust Frequency Distribution Loss for Image Transformation

TL;DR

A novel and simple frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain using Discrete Fourier Transformation (DFT), which is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.

Abstract

This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments. However, creating precisely aligned paired images presents significant challenges and hinders the advancement of methods trained on such data. To overcome this challenge, this paper introduces a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Specifically, we transform image features into the frequency domain using Discrete Fourier Transformation (DFT). Subsequently, frequency components (amplitude and phase) are processed separately to form the FDL loss function. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain. Extensive experimental evaluations, focusing on image enhancement and super-resolution tasks, demonstrate that FDL outperforms existing misalignment-robust loss functions. Furthermore, we explore the potential of our FDL for image style transfer that relies solely on completely misaligned data. Our code is available at: https://github.com/eezkni/FDL
Paper Structure (17 sections, 8 equations, 8 figures, 5 tables)

This paper contains 17 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Qualitative results trained on our synthetic DIV2K dataset with strong misalignments. Compared with LPIPS zhangUnreasonableEffectivenessDeep2018, and PDL delbracioProjectedDistributionLoss2021, the proposed method FDL yields clearer results with fewer artifacts. Zoom in to observe details.
  • Figure 2: An overview of the proposed Frequency Distribution Loss (FDL). A shared feature extractor network $\Phi$ is utilized to project images into perceptual feature space. Subsequently, the amplitude and phase of image features are obtained by Discrete Fourier Transform (DFT). Then, the Sliced Wasserstein Distance (SWD) patchSWD, as an approximation of WD, is performed separately for amplitude and phase, and the results are linearly combined.
  • Figure 3: In the one-dimensional scenario, different loss functions are employed to train the same models with aligned and randomly misaligned training data, respectively. $lq$ is the input test signal, and $gt$ is the corresponding ground truth. Each column represents the predicted results of models trained using different loss functions, with the MSE between the predicted result and ground truth.
  • Figure 4: Result of frequency components mixing. An encoder ($\Phi$) extracts features from Q and D. We mix the frequency components using the amplitude of $\Phi(Q)$ and the phase of $\Phi(D)$. Then the feature with mixed-frequency component is decoded into the pixel domain.
  • Figure 5: Shift response curves for different loss functions, including FDL, LPIPS, and Mean Square Error(MSE). We randomly shift the reference image for different pixels and calculate the discrepancy between the shifted and reference image using different metrics. The proposed FDL demonstrates strong shift robustness.
  • ...and 3 more figures