Table of Contents
Fetching ...

Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile

Seokjun Lee, Seung-Won Jung, Hyunseok Seo

TL;DR

The paper addresses spectral discrepancies in generated images from GANs and diffusion models, which manifest as aliasing and underrepresented high-frequency content. It introduces STIG, a spectrum-translation framework that refines the magnitude spectrum in the frequency domain through adversarial learning, patch-wise contrastive learning, and auxiliary regularizations, yielding a final objective $\

Abstract

Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the generated images to improve generative performance of both GAN and diffusion models. This is realized by spectrum translation for the refinement of image generation (STIG) based on contrastive learning. We adopt theoretical logic of frequency components in various generative networks. The key idea, here, is to refine the spectrum of the generated image via the concept of image-to-image translation and contrastive learning in terms of digital signal processing. We evaluate our framework across eight fake image datasets and various cutting-edge models to demonstrate the effectiveness of STIG. Our framework outperforms other cutting-edges showing significant decreases in FID and log frequency distance of spectrum. We further emphasize that STIG improves image quality by decreasing the spectral anomaly. Additionally, validation results present that the frequency-based deepfake detector confuses more in the case where fake spectrums are manipulated by STIG.

Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile

TL;DR

The paper addresses spectral discrepancies in generated images from GANs and diffusion models, which manifest as aliasing and underrepresented high-frequency content. It introduces STIG, a spectrum-translation framework that refines the magnitude spectrum in the frequency domain through adversarial learning, patch-wise contrastive learning, and auxiliary regularizations, yielding a final objective $\

Abstract

Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the generated images to improve generative performance of both GAN and diffusion models. This is realized by spectrum translation for the refinement of image generation (STIG) based on contrastive learning. We adopt theoretical logic of frequency components in various generative networks. The key idea, here, is to refine the spectrum of the generated image via the concept of image-to-image translation and contrastive learning in terms of digital signal processing. We evaluate our framework across eight fake image datasets and various cutting-edge models to demonstrate the effectiveness of STIG. Our framework outperforms other cutting-edges showing significant decreases in FID and log frequency distance of spectrum. We further emphasize that STIG improves image quality by decreasing the spectral anomaly. Additionally, validation results present that the frequency-based deepfake detector confuses more in the case where fake spectrums are manipulated by STIG.
Paper Structure (42 sections, 25 equations, 16 figures, 6 tables)

This paper contains 42 sections, 25 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Spectral discrepancies between the real and generated image in various generative networks. CycleGAN cyclegan and StarGAN stargan, which include the transposed convolution layer (row 1 and row 2), produce the grid-based aliasing in the spectrum of the generated image. On the other hand, DDPM (row 3) ddpm includes another type of discrepancy, the lack of high-frequency components.
  • Figure 2: (a) Estimation of an ideal low-pass filter by simulation with an example sinc function in the spatial domain. The sinc function with a finite kernel size causes an obvious ripple (i.e., fluctuation pattern) in the cut-off frequency band. (b) Frequency response of the denoising filter for reverse process in diffusion models. The filter still blocks the high-frequency band even if at the end of the reverse process. We notice $t \in [0, 1000]$ in this example.
  • Figure 3: Framework of our STIG. We exploit the magnitude spectrum as an input of our framework using the discrete Fourier Transform to reduce the spectral discrepancies. The spectrum of the generated image is translated into the domain of real spectrum. We, then, obtain the refined generated image by applying the inverse discrete Fourier Transform.
  • Figure 4: STIG examples on StarGAN and DDIM-Church. We magnified the image and corresponding magnitude spectrum relevant to the spectral discrepancies (yellow and red boxed). The lefts indicate the original generated image and the corresponding spectrum. On the other hand, the rights indicate STIG-refined ones.
  • Figure 5: Comparison with a frequency domain method, SpectralGAN thinktwice, for color tone. Examples are sampled from CycleGAN, StarGAN2, and StyleGAN benchmarks.
  • ...and 11 more figures